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DISCLAIMER 


This document has been reviewed and approved for publication by the 
Monitoring Branch, Office of Water Regulations and Standards, U.S. Environ- 
mental Protection Agency. Approval does not signify that the contents 
necessarily reflect the view and policies of the Environmental Protection 
Agency, nor does the mention of trade names or commercial products consti- 
tute endorsement or recommendation- for use by the U.S. Environmental Pro- 
tection Agency or the Fish and Wildlife Service, U.S. Department of the 
Interior. 


This report should be cited as: 


Glauz, W. D. 1984. 1982 National fisheries survey. Volume II: Sur- 
vey design. U.S. Fish Wild). Serv. FWS/0BS-84/14. 77 pp. 
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PREFACE 


The National Fisheries Survey was conducted to provide statistically 
valid data on the status of fish communities in the Nation. As a part of 
this effort, a survey design was developed. This work entailed: (i) the 
development of a sampling plan; (2) the selection of river reaches to be 
sampled according to this plan; and (3) the development of the statistical 
analysis protocols for carrying out the plan. These activities are de- 
scribed in this report. The initial survey design contemplated a two-phase 
survey. Although the second phase was not conducted, the full design is 
included in this report, both because it may be useful to future surveys 
and because the second phase is an inextricable aspect of the design. 


This report is the second in a three-volume series and is intended for 
use by professional fishery biologists and water quality management person- 
nel; Federal and State decisionmakers and planners; and the general public. 
Volume I presents the initial findings of the Survey including data sum- 
maries, analysis, and interpretations (Judy, R. D., Jr., P. N. Seeley, 
T. M. Murray, S. C. Svirsky, M. R. Whitworth, and L. S. Ischinger. 1984. 
1982 National Fisheries Survey. Volume I: Initial findings. U.S. Fish 
Wildl. Serv., FWS/OBS-84/06). Volume III contains the survey protocol and 
data handling activities. (Judy, R. D., Jr., and P. N. Seeley. 1984. 
1982 National Fisheries Survey. Volume III: Survey protocol. U.S. Fish 
Wild]. Serv. , FWS/OBS-84/07). 
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SECTION 1. INTRODUCTION 


The Monitoring and Data Support Division of the U.S. Environmental 
Protection Agency (EPA) has developed a draft water monitoring strategy 
with three goals (Leutner et al. 1981): 


(1) Describe the environmental quality of the Nation's 
waters by 1984 in terms of the Water Quality Ob- 
jective of the Clean Water Act - chemical, phys- 
ical, and biological integrity, 


(2) Determine the reasons for the water quality prob- 
lems that remain after 1984, and, 


(3) Describe the effectiveness of current EPA control 
programs in terms of water quality and estimate 
the effectiveness of potential future programs to 
address remaining problems. | 


The three elements, chemical, physical, and biological integrity, together 
form the goal of “water quality," as adopted by EPA. 


It has been noted that (EPA Biointegrity Working Group 1981): 


The integrity of water is practically defined as the 
degree to which waters provide for beneficial uses in- 
cluding: (1) maintenance of indigenous, balanced popu- 
lations of fish, shellfish and other forms of aquatic 
life; (2) drinking water supply and other purposes re- 
lated to human health; and (3) agricultural, industrial 
and other miscellaneous uses. 


Biological quality is difficult to define, let alone assess precisely. 
Although chemical and physical measurements are commonly made that provide 
a great deal of useful information about water quality, these measurements 
cannot be used with certainty as surrogates to biological quality. Fur- 
ther, they cannot be used to directly predict the degree to which waters 
will maintain balanced populations of aquatic life. 


Two approaches to biological assessment have been used (EPA Biointeg- 
rity Working Group 1981): 


1. Community-level analyses: Sampling cross-sections 
of entire aquatic communities including fish, 
macroinvertebrates, per’phyton, zooplankton, and 
phytoplankton. 
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2. Sub-community-level analyses: Sampling one ele- 
ment of aquatic communities such as benthic mac- 
roinvertebrates, periphyton, or fish. 


Most biological assessment studies have been at the subcommunity level he- 
cause of the great resources, in terms of dollars and expertise, required 
for community-level analyses. These studies most often have samplcd algae 
and penthic macroinvertebrates. 


Monitoring of fish communities has been strongly recommended for many 
reasons (EPA Biointegrity Working Group 1981): 


(1) Compared with other aquatic organisms, the envi- 
ronmental tolerances and competitive interactions 
of many fish are better known; 


(2) A number of trophic levels are normally repre- 
sented in fish communities (ji.e., carnivorous, 
herbivorous, omnivorous) and their relative domi- 
nance provides insight into the quality of the 
community; 


(3) Fish are dependent on other forms of aquatic life, 
therefore, the health of the fish community re- 
flects the condition of the entire aquatic commu- 
nity to some degree but changes in algae and in- 
vertebrates are difficult to relate to fish; 


(4) The size structure of populations and the growth 
rates of individuals integrate the long-term ef- 
fects of stressors; 


(5) Most fish species in perennial warm water streams 
migrate very little; 


(6) Nearly all fish species can be identified at the 
field site; 


(7) The general public can relate much easier to 
statements about conditions of the fish community 
than they can about any other segment of aquatic 
communities; and, 


(8) The results can be directly related to the Con- 
gressional mandate (to ‘provide for the protection 
and propagation of a balanced population of shell- 
fish, fish, and wildlife, and allow recreational 
activities in and on the water' ). 


The EPA and the U.S. Fish and Wildlife Service (FWS) jointly undertook 
an effort to design, conduct, and analyze a survey of the biological qual- 
ity of the Nation's waters in order to address a portion of the first goal 
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of the water-monitoring strategy. EPA and FWS personnel and those of sev- 
eral contractors were used in the survey. Biological quality was deter- 
mined by assessing the fish community in randomly selected river reaches. 
The assessment was made by fisheries experts most knowledgeable about the 
selected reaches. 


This report presents the overall survey-design concepts in Section 2 
of this report. Although the final survey design is a stratified, two- 
stage design with two phases, only the first phase was actually imple- 
mented. The sample size considerations for this design are given in Sec- 
tion 3. The first-stage sample design is presented in some detail in 
Section 4, while the second stage is summarized in Section 5. The second 
phase features of the design are covered in Section 6, and the statistical 
analysis protocols are given in Section 7. 


Every effort has been made to minimize the mathematical details in the 


body of this report. The equations are included in the Appendices for in- 
terested readers. 
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SECTION 2. OVERALL SURVEY DESIGN CONCEPTS 


DATA REQUIREMENTS 


The intent of the survey was to provide data needed to make unbiased 
national estimates of the biological quality of the Nation's flowing waters. 
The data were to be obtained from a probability sample of these waters, se~ 
lected so as to allow the use of statistical estimating procedures to make 
unbiased national estimates. 


For each sample, a knowledgeable State fisheries biologist would be 
identified and asked to complete a questionnaire. The questionnaire (sur- 
vey instrument) contained questions about the water body that the biologist 
was likely to be able to answer without doing any field work. The ques- 
tions dealt, for example, with the species of fish in the water, the qual- 
ity of the water, how the water body changed in the past 5 years with re- 
gard to support of the fish community, and how the water body is expected 
to change in the next 5 years. The survey, termed the National Fisheries 
Survey, was part of a larger effort, the Aquatic Life Survey. 


This section of the report includes general survey concepts, shows how 
these concepts were applied to sampling the Nation's waters, and outlines 
the final design. 





SAMPLE UNIVERSE CONCEPTS 


If a statistical survey is to be valid, it must start with an identifi- 
able and quantifiable population or “universe” and a listing of its members 
(the “sampling frame"). Conducting a survey of the inhabitants of a city, 
for example, requires a directory of all the inhabitants. A telephone di- 
rectory is often used for such purposes, but then the survey is not of the 
inhabitants, but of the families who own telephones and choose to have 
their numbers listed. 


The sampling frame must be complete (e.g., no inhabitant's name is 
missing) and it must be accurate (e.g., no inhabitant should be listed more 
than once.) The latter requirement is needed to ensure that there is no 
bias in the sample selection. (Inhabitants listed more than once would 
have a greater chance of beirg selected thin inhaditants listed only once. ) 


Finally, the sample frame should be able to be clearly described. Us- 
ing the previous example, possible sample frames might be: 











(1) All persons with listed phone numbers in the 1982 
directory of Anyplace, U.S.A.; 


(2) All persons living within the city limits oy Any- 
place, U.S.A., who were at least 21 years old as 
of July 1, 1982; or - 


(3) All persons with a home mailing address within the 
city limits of Anyplace, U.S.A. 


WATER SAMPLE UNIVERSE 
Reaches 


In concept at least, the population or universe of interest in this 
survey consists of all of the “Nation's flowing waters." While the concept 
is reasonably straightforward, a more detailed definition is required to 
define the population in a fully operational sense. 


The applicable statistical theory requires that the conceptual popula- 
tion consists of individually recognizable elements or units. Given an 
understanding of exactly what the units are, the population definition spe- 
cifies which units belong to the population and, if there is some doubt, 
which do not. With reference to this survey, individually recognizable 
units were sought which, in the aggregate, would provide an operational 
definition of "the Nation's flowing waters." 


The Monitoring and Data Support Division (MDSD) of the U.S. Environ- 
mental Protection Agency developed a cataloging system in which each body 
of surface water is segmented according to a well-defined procedure. fhe 
resulting segments, called reaches, are cataloged with unique code numbers. 
The cataloging system, called the River Reach File, facilitates the aggre- 
gation of reaches into river systems, watersheds, and other hydrologic 
units of interest. 





The definition of a reach used for this survey follows that developed 
by the MDSD for construction of the River Reach File. According to this 
definition (Horn 1981): 


Most reaches represent the approximate centerlines of 
streams and extend between points of confluence with 
other streams. The reaches constructed within open 
waters are generally straight lines connecting tribu- 
tary streams with assumed transport paths through the 
open waters. 


This definition was expanded to include the following additional 
points. A reach represents the approximate centerline of a body of water 
and begins at a point of confluence with another body of water or at its 
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origin, when no such point exists. The reach ends at the next point of 
confluence with another body of water or at its terminus whei: no such point 
exists. For a body of water that has no point of confluence with another 
body of water, a reach represents its approximate centerline extended across 
its widest expanse. 


Cataloging Units 


The EPA River Reach File is an extension of the U.S. Geological Survey 
(USGS) Hydrologic Unit System. In this system, the geographic area of the 
United States is partitioned into hydrologic units in the following manner. 
There are 21 major water basins (river basins having drainage areas greater 
than 700 mi2) in the United States (Table 1). Within the major water ba- 
sins (also called hydrologic regions), the Water Resources Council has 
designated 222 subregions. A subregion includes that area drained by a 
river system, a reach of a river and its tributaries in that reach, a 
closed basin(s), or a group of streams forming a coastal drainage area 
(Water Resources Council 1970). Within the subregions are the accounting 
units of the National Water Network; within the accounting units, there 
are over 2,100 cataloging units (CU's) (U.S. Geological Survey - ). Ac- 
counting units and cataloging units are based on watershed configuration 
and size. 








Table 1. Major U.S. water basins 








No. Water Basin 

1. New England 

2. Mid-Atlantic 

3. South Atlantic - Gulf 
4. Great Lakes 

>. Ohio 

6. Tennessee 

7. Upper Mississippi 

8. Lower Mississippi 

9. Souris - White - Red 
10. Missouri 

11. Arkansas - White - Red 
12. Texas - Gulf 

13. Rio Grande 

14. Upper Colorado 

15. Lower Colorado 

16. Great Basin 

17. Pacific Northwest 
18. California 

19. Alaska 

20. Hawaii 

21. Caribbean 














The USGS, in cooperation with the Water Resources Council, prepared a 
series of nationally consistent State Hydrologic Unit Maps that delineate 
all of the hydrologic units in the system described above. The maps assign 
a unique 8-digit code to each cataloging unit, from which each of the 
larger hydrologic units in which it is contained can be identified. Spe- 
cifically, the 8-digit identifier consists of four pairs of digits with the 
first through fourth pairs designating, respectively, hydrologic region, 
hydrologic subregion, accounting unit, and cataloging unit within account- 
ing unit. 


The EPA River Reach File extends the USGS system by designating 
reaches within cataloging units. A reach number is assigned to each reach 
within a cataloging unit, such that reaches are uniquely identified (na- 
tionally) when the reach number is combined with the cataloging unit number 
(code). The cataloging unit number is included in the record of each of 
the approximately 68,000 reaches in the current River Reach File. Every 
cataloging unit in the 48 contiguous States is represented in the file by 
one or more reaches. 


Eligible Reaches 





The sampling frame for this survey is defined as those reaches of 
rivers and streams: 


(1) contained in the 48 contiguous States; 
(2) shown on 1:500,000 U.S. Geological Survey maps; 


(3) including watercourses shown on the maps as being 
seasonally intermittent, impoundments, reservoirs, 
canals, and constructed channels and waterways; 
and 


(4) excluding the Great Lakes and other lakes, marine 
waters, estuaries, and wetlands. 


Cataloging units known to include only ineligible reaches (according 
to item 4 above) were removed from the sampling frame, as were cataloging 
units outside the 48 contiguous States. The specific cataloging units that 
were excluded are listed in Table 2. The sampling frame that completely 
defines the geographic area of interest thus contains reaches in the re- 
maining 2,101 cataloging units. 


Operational Definitions 





Having removed the cataloging units that contain only ineligible 
reaches from the frame, it is still necessary to remove individual ineli- 
gible reaches from within the other CU's. The ineligible reaches that need 
to be removed include lakes, marine and estuarine reaches, and wetlands. 
Reaches within reservoirs and impoundments are eligible, as are reaches in 
which the flow is intermittent. Constructed reaches, such as canals, are 
also eligible unless excluded for some other reason. 
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Table 2. Cataloging units_that were excluded from the 
first-stage sampling frame 





Cataloging unit 


Location 





01100007 Long Island Sound 

02040204 Delaware Bay 

02060001 Chesapeake Bay 

02080101 Chesapeake Bay 

03090203 Florida Bay 

04020300 Lake Superior 

04060200 Lake Michigan 

04080300 Lake Huron 

04120200 Lake Erie 

04150200 Lake Ontario 

18060014 Santa Rosa Island and 
Santa Cruz Island 

18070107 Santa Catalina Island 








*Tn addition, all cataloging units from Regions 19, 20, 
and 21 were excluded. 


Except for marine and estuarine reaches, the identification of ineli- 
gible reaches for frame-construction purposes was made on the basis of map 
symbols. Reference maps for this purpose were USGS maps with a 1:500,000 
scale, such as the state-level Hydrologic Unit Map Series. Reaches not 
shown on these reference maps were, by definition, considered ineligible. 


The map symbols were ambiguous in some cases, and some doubt remained 
about including a particular reach in the frame. All such reaches were 
initially included in the frame. Ineligible reaches improperly maintained 
in the sampling frame and subsequently selected into the sample were iden- 
tified during the collection of the sample data and excluded during the 
data analysis. Including some ineligible reaches in the frame tended to 
increase sampling variances, but avoided biases. 


Marine and estuarine reaches are, of course, not identifiable by map 
symbols. These habitats are commonly defined in terms of tidal influences, 
salinity, or some related measure. These types of definitions generally 
result in a temporally varying boundary, which limits the utility of using 
such definitions. In any event, the information necessary for using these 
definitions for frame-construction purposes is not available. 
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Therefore, an operational definition was developed that excluded ma- 
rine and estuarine reaches, which is given below. The goal behind the op- 
erational definition is to specify a set of measurements, which can be 
easily made using the reference maps, that can be used to determine the 
eligibility status of a given reach. The measurements are based on the 
width of a river system in relation to the distance from the confluence of 
the river system with an oceanic coast, including the ocean proper, coastal 
sounds, bays, straits, gulfs, and inlets. The definitions are presented in 
terms of measurements taken in inches from the 1:500,000 maps. 


The first criteria applied in defining coastal reaches was: 


ae A cataloging unit is defined as a coastal cataloging unit if it 
has a boundary that is partially undefined on the reference maps. 


a A coastal reach is defined as any reach having at least part of 
its length in a coastal cataloging unit or in a cataloging unit 
having any part of its boundary in common with one or more 
coastal cataloging units. 


The remaining criteria were applied to coastal reaches, as defined above. 
Two possibilities were identified: (1) a river system in a coastal cata- 
loging unit with a terminus at an oceanic coast; or (2) a river system that 
crosses a cataloging unit boundary into a coastal cataloging unit. The 
width of the river system was measured at these points (i.e., at the ter- 
minus of the system with an oceanic coast or at the point of crossing a 
boundary with a coastal cataloging unit). 


Identifying the oceanic terminus of a river system involves some arbi- 
trary judgment, regardless of the rules employed. Complications arise due 
to offshore island chains, river deltas, salt marshes, and similar forma- 
tions. The folTtowing “rules” were used to identify a terminus: 


3. A river system is clearly contained between two continuous shore- 
lines or river banks at some point upstream from the oceanic 
terminus. The terminus is defined as the point where the shore- 
lines are no longer opposing, become discontinuous (as in island 
chains), or the channel of the river system divides to create 
more than one pair of opposing shorelines (as occurs in a delta). 


The width of a reach at the oceanic terminus or at the boundary with a 
coastal cataloging unit is measured at right angles to the longitudinal 
axis of the reach at that point: 


4. If the width of the reach on the map is greater than 1/8 inch 
(i.e., 0.986 mile), the (approximate) longitudinal axis of the 
river system is followed upstream to the point where the width, 
measured as above, first equals 1/8 inch or until a distance of 
5 inches (i.e., 39.457 miles), measured along the longitudinal 
axis of the river system, has been traversed. 
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5. If the width of the reach is less than 1/8 inch, and the reach is 
contained in a coastal cataloging unit, the longitudinal axis of 
the river system is followed upstream for a distance of 1 inch 
(i.e., 7.891 miles). 


All reaches contained in the river system downstream from the points iden- 
tified by 4 and 5 are ineligible. 


6. If the width of the reach is less than 1/8 inch, and the reach is 
not contained in a coastal cataloging unit, the reach is eligi- 
ble. 


The boundaries defined by 4 and 5 are unlikely to coincide with reach 
boundaries. Given the difficulty of describing boundaries other than reach 
boundaries to the biologists participating in the survey, the determination 
was made that any reacli with part of its length classified as eligible 
under the above procedure was considered entirely eligible. 


TWO-STAGE DESIGN 


At the time of the survey design work the River Reach Fiie contained 
about 68,000 reaches. However, some of the reaches were not units in the 
population of interest to the survey, and many reaches in the population of 
interest were not in the file. (A complete River Reach File would include 
an estimated 179,000 reaches.) Sampling frames need to be complete, in the 
sense of including every unit in the population, if biases in the parameter 
estimates are to be avoided. Sampling efficiency is increased when the 
frame does not contain units that are not members of the population. 
Therefore, modifications to the River Reach File were required before the 
sample could be selected. Deleting ineligible reaches and adding missing 
reaches to the File was not feasible because of both time and cost con- 
straints. 


Instead, the decision was made to select the sample of reaches in two 
stages. Under this design, a first-stage sample of cataloging units was 
selected. Then, a complete file of all reaches in these cataloging units 
was created. Finally, a second-stage probability sample of reaches was se- 
lected from the first-stage sampling frame. The two-stage sampling pro- 
cedure reduces the frame construction problem to manageable proportions. 
The overall sampling process is illustrated in Figure l. 


Control over the distribution of the first-stage sample was provided 
by stratification of the area frame. Stratification variables were defined 
in terms of urban/industrial development, climate, and agricultural land 
use. The first-stage sample was allocated proportionately across strata, 
and cataloging units were selected with probability proportional to the 
total length of reaches contained in the unit. 


A separate stratum of reaches of large river systems was also sampled. 
This was actually obtained by a single stage of sampling, but can be 
thought of as a two-stage process where the first stage was the universe of 
reaches of large river systems, rather than a sample. 


10 
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Figure 1. Schematic of the information collection procedure 
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TWO-PHASE DESIGN 


The survey plan was to contact state-level fisheries biologists by 
maii and ask them to provide the information sought for one or more sample 
reaches. Respondent selection procedures were to be developed to ensure 
that the biologists most likely to have knowledge of, or access to, the 
required information were identified for each sample reach. The procedures 
were to be developed by the FWS with the cooperation of several State fish 
and game agencies. 


Because the respondent selection procedures provided no guarantee that 
the designated biologist necessarily possessed accurate (or indeed any) in- 
formation concerning the sample reach, a second phase of data collection 
was considered. A two-phase design is typically employed to fully account 
for missing data and possibly other types of response biases (measurement 
errors). In the second phase, a subsample of reaches was to be selected 
from the original, or phase one, sample. and remeasured by data-collection 
procedures developed to provide the required information for any reach. 
Combining ‘he phase-one and phase-two information in the form of a differ- 
ence estimator would provide linear statistics, which are unbiased esti- 
mates of corresponding population parameters. However, when the potential 
for measurement biases associated with the phase-one data collection pro- 
cedures can be ignored, the phase-two design is not needed. 


Although the study design included a second phase, cost and time limi- 
tations precluded its implementation for the National Fisheries Survey. 
Future surveys could, however, utilize the complete design. 


SUMMARY 


The survey design was a stratified, two-stage, two-phase design to ob- 
tain information from a probability sample of river reaches. Estimation 
procedures were developed in accordance with the design to enable the sur- 
vey data to be analyzed to obtain unbiased national estimates of the param- 
eters of interest. The second phase of the design was not implemented. 
Therefore, alternative estimation procedures were developed using only the 
phase-one design. 


12 
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SECTION 3. SAMPLE SIZES 


OVERVIEW 


Given the objectives of the survey and the essential features of the 
survey design, the next step in designing the survey was to determine the 
sample size and the allocation of the sample across the strata needed to 
provide parameter estimates having the required precision. For this pur- 
pose, equations were developed that approximated the sampling variance as a 
function of the important design features and the sample sizes. The equa- 
tions were then solved, subject to the variance constraint implied by the 
precision requirement, to obtain the needed sample sizes. 


In general, the sample size to be used in a survey depends on the sur- 
vey design, the expected results to be estimated by the survey, and the de- 
sired precision of the estimates. For example, the desired information 
might be the fraction of the Nation's reaches that do not support sport 
fish. From other information, it might be expected (hypothetically, in 
this example) that 20% of the reaches are in this category. In the nota- 
tion used subsequently, this is represented by the proportion P = 0.2. 
Further, the estimate might require a precision specified by the relative 
Standard error, or ratio of standard deviation to mean, of A= 0.10. Under 
these assumptions, if the survey produces an estimate for P of 0.18, *1e 
95% confidence bounds are 0.18 (1 + 2 A) or between 0.144 and 0.216. 


Because the survey questionnaire contained many questions, there are 
many possible values of P. For design purposes it was assumed that a sin- 
gle value of P would be used, corresponding to the estimate of the param- 
eter(s) of major concern. 


The sample design of this survey can be summarized as a stratified, 
two-stage, two-phase design. Specifically, the total allocation problem 
addresses three issues. The first issue is the total sample size; that is, 
the number of sample reaches required to satisfy the precision levels as- 
sociated with the parameter estimates of major concern. The second issue 
is the allocation of the total sample size between the two stages of the 
first-phase sample and between the two phases of the total design. The 
third issue is the allocation of the sample to the strata imposed on the 
first-stage area frame and the three strata defined on completion of the 
phase-one data collection activities, called post-strata. 


The equations needed and their derivation are given in Appendix A. 
Recommendations for each of the three issues were developed and are summar- 
ized in Appendix B. 
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Many sample size computations were required to investigate sensitiv- 
ities to various assumptions because there were so many factors to be con- 
sidered and most factors initially could only be approximated. Moreover, 
the equations developed required computerization in order to be solved ef- 
ficiently. The factors that needed to be considered and their values were: 


Factors Values 
Expected result (the proportion, P) 0.20 
Relative standard error of the estimate, A 0.10 
Correlation among reaches in the same cataloging unit, p 0.05 
Sampling cost ratio, lst to 2nd phase ° 0.10 
Sample selection cost ratio, lst to 2nd stage 2.00 
Number of cataloging units in lst stage 218 
Number of reaches in sample (Phase 1) 1,308 
Number of reaches per cataloging unit in 2nd stage 6 


These values formed the basis for the sample selection process. As 
discussed below, the values were adjusted only slightly as the process was 
carried out. 


SAMPLE ALLOCATION 


Sample Sizes 





The allocation of the sample to the strata imposed on the first-stage 
area frame and the allocation of the phase-two subsample to the post-strata 
were the next items to be considered in the survey design. The first-stage 
stratification, intreduced in Section 2, is described in detail in Sec- 
tion 4. One aspect of this stratification was the desirability of identi- 
fying a stratum that consisted of the Nation's largest rivers. It wits , 
therefore, necessary to distinguish between this large river stratum and 
the remainder of the area frame. Ninety-eight cataloging units were clas- 
sified into the large river stratum. Because sample reaches were directly 
selected from the large river stratum in a single stage of sampling, the 
number of cataloging units in the large river stratum was added to the 218 
other first-stage cataloging units, for a desired total of 316 cataloging 
units. On the other hand, the number of reaches selected from the large 
river stratum was logically a portion of the desired total sample size of 
1,308 reaches. 


The total sample size was allocated between the large river stratum 
and all the other strata in proportion to the relative size of each strata, 
where stratum sizes were measured as the total length (miles) of reaches in 
each stratum. The 98 cataloging units making up the large river stratum 
contained approximately 6.07% of the total reach length; therefore, 79 
reaches should have been allocated to this stratum. 


However, numerical inaccuracies at the time of the initial calcula- 


tions resulted in the large river stratum being credited with 6.46% of the 
total reach length, which, in turn, resulted in an allocation of 84 reaches 
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to this stratum and 1,224 (1,308-84) reaches for the remainder of the 
strata. When this discrepancy was identified, selection of the first-stage 
sample was complete and the second-stage frame was under construction. 
Therefore, it was not feasible to redo the allocations. Furthermore, the 
shifting in the allocation of 5 reaches out of some 1,300 is not likely to 
lead to perceptible changes in the conclusions. 


It was decided to retain the 1,224 nonlarge river reaches but change 
the allocation of large river reaches from 84 to 79, reducing the total 
sample size by 5 reaches to 1,303. The resulting number of nonlarge river 
reaches per cataloging unit (1,224/218 = 5.61) is not an integer. Keeping 
the nu. of reaches per cataloging unit at 6 in the remainder of the area 
frame requires a sample allocation of: 


2228 = 204 
cataloging units for this part of the area frame, rather than 218. 


The resulting allocation, including the large river stratum, is sum- 
marized below: 








Total Large Remainder 
Units sample size river stratum of area frame 
Reaches 1,303 79 ; 1,224 
CU's 302 98 204 
Reaches/CU 4.31 0.81 6 


Allocation to First-Stage Strata 





The first-stage sample of 204 CU's was allocated to the strata in pro- 
portion to the relative size of each strata. Size, for this purpose, was 
defined as the total length of the reaches in each CU. On this basis, the 
number of CU's in each strata, h, was determined as: 


n,(h) = 204 


2L(h) 
h 


where L(h) is the total length of reaches in all CU's in stratum h. 
” 


Allocation to Post-Strata 





Estimates of the three post-strata sizes can, of course, be computed 
from the phase-one sample data. In general, these values are different for 
the individual items on the questionnaire. While it might be possible to 
design more or less separate phase-two procedures for each questionnaire 
item, to do so seems excessive. An alternate procedure is to estimate the 
post-stratum sizes for a few key variables and average the results. 


15 


BEST COPY AVAILABLE 














However, it is not efficient to allocate the phase-two samples accord- 
ing to the relative post-strata sizes. Rather, they should be ailocated 
more heavily to the strata with the larger probabilities of phase-one mis- 
classification errors. Letting S.7(j) be the population-level variance in 
the differences, d(g) = x(g)-y(q) for post-stratum j. If the assumptions 
made in Appendix B about the probabilities of misclassification errors are 
correct, then the sa) are roughly in the ratio 0.1, 0.4, and 0.5, respec- 


tively, for j=1, and 3. These values were used to allocate reaches to 
the post-strata. 
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SECTION 4. FIRST-STAGE DESIGN 


INTRODUCTION 


The first-stage sample, other than the large river stratum, was allo- 
cated proportionately across strata, and sampling units were selected with 
probability proportional to the total length of reaches contained in the 
unit. This section discusses, first, how the size measure (reach length) 
was developed. Then, the stratification of CU's using this size measure is 
described, and the final stratification plan is presented. How the sample 
sizes for each strata were divided for purposes of replication is covered 
next, followed by the actual probability sampling procedure used to select 
the first stage sample. The list of the 302 CU's selected, together with 
their stratification and other properties, is contained in Appendix C. 
Appendix D provides the mathematics of the selection procedure. 


CONSTRUCTION OF SIZE MEASURES 


A number of features of the sample design are defined in terms of mea- 
sures of size. for example, the first-stage sample c/ CU's and the second- 
Stage sample of reaches were selected with a probability proportional to 
their size. Size measures associated with this study were defined in terms 
of lengths of reaches. Specifically, the size of a CU is the sum of the 
lengths of all reaches in the CU. The requirement of the design, in this 
regard, is that these measures be available for al] reaches and for all 
CU's in the target population. 


When the River Reach File was originally constructed, the length of 
each reacn in the file was measured by a highly accurate technique and in- 
cluded in the reach's file record. However, the size measures required for 
allocating and selecting the first-stage sample could not be obtained from 
the River Reach File because it contains some ineligible reaches and does 
not contain other reaches included in the target population. The construc- 
tion of CU size measures was, therefore, part of the first-stage sample se- 
lection process. 


The size measures were obtained from a nationally complete set of USGS 
Hydrologic Unit Maps scaled at 1 inch to 500,000 inches. The total length 
of eligible waterways shown within a cataloging unit's boundaries, measured 
to the nearest one-quarter inch, was its estimated size. These lengths 
were measured for the 2,101 eligible cataloging units, using mechanical map 
measurers (also called map meters), calibrated in inches. The meter has a 
cycle of 39 inches. Therefore, the measurement procedure involves keeping 
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track of the number of cycles that the meter makes while measuring a par~ 
ticular unit. This potential source of error was minimized by the training 
and supervision of the staff involved. 


Another potential source of error arises from the fact that the inter- 
twining, multidirectional lines of a waterway can present a confusing pic- 
ture, making it difficult to ensure that a!1! watercourses are measured once 
and only once. To resolve this potential difficulty, approximately 1 inch 
square grids were ruled on those maps not already printed with such grids, 
the length of waterways in each square was measured, and the square 
"crossed off." The total inches of waterways within the CU provided the 
desired size measure. 


Cataloging units were raidomly assigned to the staff performing the 
work to avoid the possibility of having correlated measurement errors cor- 
responding to geographical areas or features of the sample design. 


STRATIFICATION OF THE FIRST STAGE FRAME 


The first-stage sampling frame of 2101 CU's was first divided into two 
principal parts. It was deemed important that the survey include, with 
certainty, data from the nine largest rivers, selected on the basis of vol- 
ume and length. The rivers placed in the special “large river" stratum are 
listed in Table 3. All of the 98 cataloging units containing reaches of 
these nine rivers were included as one principal part of the sample, from 
which reaches would be drawn. 


Table 3. List of river systems comprising 
the large river stratum 





Name of river 





Lower Mississippi 
Lower Missouri 
Ohio 

Columbia 

Lower Arkansas 
Tennessee 

Lower Red 
Susquehanna 
Alabama Coosa 





The remaining 2,003 cataloging units were classified into 13 strata to 
provide control] over the distribution of the area sample with respect to 
urban/industrial development, climate, and agricultural land use. The 
stratification variables defined for this purpose were: 
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(1) Presence (or absence) of one or more cities with a population of 
100,000 or more in the group of counties represented by a cata- 
loging unit; 

(2) Ecoregion domain (humid or dry); 

(3) Percentage of total area in irrigated cropland; 

(4) Percentage of total area in other (nonirrigated) cropland; and 

(5) Percentage of total area in range land. 


The area forming the base for determining the percentages was taken as the 
total area of all counties within which a cataloging unit was located. 


Counties in which specific cataloging units were found were identified 
by detailed observation of hydrologic maps (U.S. Geological Survey 1974). 
Total areas of counties were obtained from published data (U.S. Bureau of 
the Census 1972), as were acreages of cropland, irrigated land, and range 
land (U.S. Bureau of the Census 1977, 1980, 1981b), the former being used 
only for North Dakota. Government documents (U.S. Bureau of the Census 
198la) were also used to obtain the locations and populations of large 
cities. Dry ecoregion domains were those with ecoregion codes in the 3000 
series; all other ecoregions were classified as humid. 


No effort was made to weight county contributions in proportion to 
areas actually within the unit in the calculation of percentages of a cata- 
loging unit's total area in irrigated cropland, other cropland, and range 
land for several reasons. First, accurate measurement of areas on the hy- 
drologic unit maps would have been quite time consuming and expensive. 
Second, assignment of a county's contributions to the three categories in 
proportion to its total acreages would not necessarily have been in accord 
with the actual situation because of variation in land use within counties. 
Third, it was assumed that general land use patterns in adjoining counties 
tend to be similar. Therefore, changes from equal weighting would not re- 
sult in sizable differences in percentages. Fourth, it appeared that the 
reaches of a cataloging unit were likely to be affected by land use around 
the unit as well as within the unit. Fifth, relatively few units would ac- 
tually be placed in different strata as a result of changes in weighting 
because only broad groupings of percentages would be used in defining 
strata. 


It was desirable to have strata of approximately equal size (equal 
length of reaches) to facilitate the imposition of the replicated feature 
of the design. Examination of the cataloging units showed that approxi- 
mately 3/13 of the total size-measure was in the urban cataloging units (as 
defined by presence of cities of 100,000 or greater population), 6/13 were 
in humid rural units, and 4/13 were in dry rural units. Classification 
was, therefore, first carried out on the basis of rural-urban definition 
and then, within the rural group, according to ecoregion domain. 

Frequency distributions were run on percentages of irrigated cropland 
within each of the three groups that resulted. In each case, a stratum 
with approximately 1/13 of the total size-measure was created to include 
the cataloging units with the highest percentages irrigated. Finally, 
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two-way distributions of the remaining units were run by percentages of 
nonirrigated cropland and range land, and breaking points were chosen that 
would result in almost equal-sized strata. 


The final stratum definitions are shown in Table 4. The total miles 
of reaches and numbers of CU's in each stratum are shown in Table 5. Also 
shown are the theoretical numbers of cataloging units to be selected in the 
first-stage sample, based on the proportion of miles of reach in each 
Stratum (except, of course, for the large river stratum, where all 98 CU's 
were automatically included). 


Table 4. Description of first-stage strata 











City of Proportion of land area 

Stratum at least Ecoregion Irrigated Other Range 
number 100 ,000 domain cropland cropland land 

1 No Humid <2. 5% < 7.0% =e 
2 No Humid <2. 5% 27.0%, <27.0% < 2.7% 
3 No Humid <2. 5% 27.0%, <27.0% > 2.7% 
4 No Humid <2.5% 227.0% < 4.7% 
5 No Humid <2.5% 227.0% > 4.7% 
6 No Humid 22.5% oo oe 
7 No Dry <3. 5% < 3.0% <22.0% 
8 No Dry <3. 5% < 3.0% 222.0% 
9 No Dry <3. 5% > 3.0% oo 
10 No Dry b 23.5% oo oo 
1l Yee o <1. 5% <22.0% oe 
12 Yes” oo <1. 5% 222.0% oe 
13 Yes a 21.5% oo oo 
14 Large river stratum 





SHumid includes ecoregion codes: 

- 2000 humid temperature domain, 

- 4000 humid tropical domain. 

Bry is ecoregion code 3000. 

A dash indicates the variable is not used in the stratum definition 
(i.e. , may have any value). 


REPLICATION 


All reaches in the 98 cataloging units in stratum 14 (the large river 
stratum) were eligible for selection in the final sample. The rest of the 
first-stage sample (i.e., 204 cataloging units) was allocated proportion- 
ately across strata 1 through 13, as indicated in Table 5. The total al- 
location to each of these 13 strata was equally divided among four repli- 
cates. 
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Table 5. Sizes of first-stage strata 











Number of 
cataloging 
Stratum Size units in Sample 
number measures the stratum allocation 
1 8,009.00 184 15.6329 
2 8,000.25 160 15.6158 
3 8,023.50 157 15.6612 
4 8,025.75 156 15.6656 
5 8,041.75 150 15.6968 
6 8,049.50 176 15.7119 
7 7,933.50 114 15.4855 
8 7,944.75 153 15.5075 
9 7,913.50 114 15.4465 
10 7,989.75 160 15.5953 
1l 8,194.50 148 15.9950 
12 8,241.25 148 16. 0862 
13 8,145.75 183 15.8998 
14 6,751.00 98 98 
Total 111,263.75 2,101 302 





stiles of reaches in the stratum. 
Theoretical number of cataloging units to be selected. 


The resulting allocations for each replicate were not integers, being 
between three and four in each stratum except stratum 12, for which the 
proportional allocation was between 4 and 5. Probabilities were assigned 
to each of the possible integer values, such that the expected values 
agreed with the theoretical allocations. For example, if the desired 
replicate allocation was 3.75, assigning the probability 0.75 to the in- 
teger 4 and 0.25 to the integer 3 would produce this desired expected 
value. 


SELECTION OF THE SAMPLE 


The actual selection of the first-stage sample of cataloging units was 
based on a procedure called “sampling with probability proportional to size 
and with minimum replacement." The mathematical process is outlined in Ap- 
pendix D, but is illustrated here by simple examples. The same procedure 
was used in the stage-two selection of reaches within sampling units. 
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The term "probability proportional to size" simply means that the 
larger CU's (in terms of miles of reaches) had a greater chance of being 
selected than the smaller ones. In fact, if 2, . is the size of CU i in 
stratum h and L, is the total miles of reaches in’ all CU's in that stratum, 
then CU i has a probability of being selected proportional to: 


The phrase “with minimum replacement" can best be explained by con- 
trasting it with two other commonly used approaches. One of these is 
"without replacement," an approach wherein a unit, once selected, is re- 
moved from the population being sampled, thus assuring that it will not be 
selected again. The other approach is “with replacement," wherein a se- 
lected unit remains in the population being sampled and, therefore, could 
be selected more than once, according to the laws of probability. 


The “minimum replacement" concept recognizes that more than one unit 
is to be selected. (In fact, n,(h), or about 16 units were selected in 
this case, in accordance with Table 5.) The situation could arise where 
there were fewer than n,(h) units in the population, so the “without re- 
placement" approach clearly would not be applicable. In fact, the proba- 
bility of selecting CU i in stratum h is: 


x= 
p n,(h) By ify 


with n,(h) being the proportionality constant implied earlier. This “pro- 
bability" can be greater than 1, so it is not really a probability. For 
example, if 16 CU's are to be selected, yet one of the available CU's com- 
prised 10% of the size of the stratum, its "probability" of being selected 
would be: 


p* = 16(0.1) = 1.6 


In effect, the “minimum replacement" approach ensures that such units 
are selected at least once (or more, depending on the integer part of p*), 
are then replaced, and can be selected again with a probability equal to 
the fractional part of p*. 


This concept was applied to the selection of CU's from within the 13 
nonlarge river strata. The exact mathematical procedure implementing this 
concept is given in Appendix D. It was applied separately and indepen- 
dently for each of the four replicates. Thus, it was possible for a CU to 
be selected into more than one replicate. 


The above discussion applies only to the 13 nonlarge river strata. As 
stated previously, all reaches in the 98 CU's in the large river stratum 
were eligible. Appendix C contains the list of CU's in the final first- 
stage sample for all 14 strata. 
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SECTION 5. SECOND-STAGE SAMPLE 


SECOND-STAGE FRAME 


Given the set of cataloging units (CU's) making up the first stage 
sample, the next step was to construct the second stage frame, or list of 
eligible reaches in each of the sample CU's. As stated previously, the 
frame was required to be complete, in the sense of including all of the 
eligible reaches, and efficient, in the sense of containing few, if any, 
ineligible reaches. 


Another organization, independent of MRI/RTI, constructed this frame. 
Automated digitization equipment was used to trace and record data for each 
reach in the selected 302 cataloging units. The operational definitions 
given in Section 2 were used and applied to the 1:500,000 scale maps. 


At the conclusion of this effort, a computer file was available that 
included, for each of the 302 CU's, the length of each reach in the second 
stage frame, along with a unique reach identification sequence number. The 
reach lengths provided the size measures needed for the randomization pro- 
cedure. 


SAMPLE SELECTION 


Reaches were selected with a probability proportional to their size 
and with minimum replacement, using the procedures discussed in Section 4 
and presented in detail in Appendix D. The second stage randomization pro- 
cedure was applied independently for each replicate. That is, if the same 
CU appeared in more than one replicate, sample reaches were selected from 
the total frame listing for that CU in each replicate. As a result, while 
the total number of selections was equal to the sample size recommendations 
in Section 3, the number of distinct reaches could be less because the same 
reaches could appear in more than one replicate. 


Large River Stratum 





The large river stratum was defined by the area within the 98 catalog- 
ing units containing reaches of the nine large rivers listed in Table 3. A 
total of 79 sample reaches was to be selected from within the large river 
stratum, equally allocated across four replicates. That is, each replicate 
was required to have an expected 19.75 sample reaches. Either 19 or 20 
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reaches could be selected in any replicate. Assigning a probability of 
0.25 to choosing 19 reaches and assigning a probability of 0.75 to choosing 
20 reaches provides the necessary expectations. The choice was made. for 
each replicate, in turn, using a random number table. 


Nonlarge River Strata 





The other 13 strata each consisted of a first-stage sample of 15 or 16 
CU's drawn in four replicates, with 3 or 4 CU's per replicate (a total of 
204 CU's). From each CU, a total of six reaches was selected with the pro- 
bability proportional to size and with minimum replacement procedures. 
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SECTION 6. SECOND-PHASE DESIGN 


INTRODUCTION 


As noted in Section 2, a two-phase design was originally contemplated, 
and the associated sampling procedures were developed, as were the corre- 
sponding statistical estimation procedures. However, the second-phase sur- 
vey was not conducted (indeed, a second-phase sample was not even selected) 
and it seems inappropriate to include the rather lengthy two-phase estima- 
tion procedures here. Nevertheless, the second-phase sampling procedures 
are documented here, even though they were not completely implemented. 


TWO-PHASE CONCEPT 


A two-phase or double-sampling feature of a design is employed to 
fully account for missing data and, possibly, other types of response 
biases (measurement errors) associated with the collection of reach infor- 
mation by mail. Under the double-sampling design, a subsample of reaches 
would be selected from the initial or phase-one sample and remeasured using 
procedures (as yet unspecified) which were free of biases. Combining the 
phase-one and phase-two information in the form of a difference estimator 
would provide linear statistics that were unbiased estimates of correspond- 
ing population parameters. However, if the potential for measurement bi- 
ases associated with the phase-one data collection procedures can be ig- 
nored, the phase-two design is not needed. 


The State fisheries biologists selected to provide the information for 
the sample reaches were also asked to indicate their assessment of the ac- 
curacy of some of the information provided. The categorizations provided 
by the biologists can be used to post-stratify the sample reaches into 
three groups reflecting the relative accuracy of the assessments made by 
the biologists and including a "no information" stratum. Table 6 suggests 
post-strata based on these assessments. 


The total second-phase subsample would then be proportionately allo- 
cated to the post-strata and the subsample selected with equal probability 
and without replacement from within the post-strata, ignoring the features 
of the phase-one design except for the identification of replicates. 
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Table 6. Suggested classification of reaches into 
post-ctrata 





Post-stratum Reach characteristic 





3 Answer to one or more key questions 
missing or classified into the “un- 
known" response category. 


2 Not classified into post-stratum 3 and 
3 or more of the key questions clas- 
sified into the “probably yes or no" 
response category. 


1 Not classified into either of the other 
two post-strata. 





SAMPLE SIZE AND ALLOCATION 


The proposed phase-two total sample size was approximately 223 reaches 
(Appendix B, Table B-2). The optimal allocation of the sample would 
require knowledge of the relative sizes of the post-strata and of the post- 
strata-level variances of the differences between the phase-one and phase- 
two information. While the post-stratum sizes could be estimated upon 
completion of phase one, some assumptions would be necessary about the mag- 
nitude of the variances involved. 


The initially assumed sizes of the post-strata populations, N(j), 
would be estimated as follows. The statistical estimation procedures ap- 
propriate to the two-stage, two-phase design would enable estimates, N (j), 
for each of the four replicates, r=1, 2, 3, and 4. These four volues 
would then simply be averaged to obtain an estimate of N(j): 


a 4. 
N(j) = 1/42 N C3) 
r=1 


Then, the allocation to be made to post-stratum j is: 





N({)VCj) 
a a 


2 N(j)VC3) 
j=l 


mj) =m 
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where the V(j) are the measures of the relative errors in each stratum, as- 
sumed to be 0.1, 0.4, and 0.5, respectively, during the initial sample size 
calculations (see Appendix B). 


The values, m(j), would then be equally divided among the four repli- 
cates in the expected value sense. Each phase-two replicate-level sample 
would be selected with equal probability and without replacement from 
within post-strata, ignoring the structure of the phase-one design. 
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SECTION 7. ESTIMATION PROCEDURES 


GENERAL 


The survey procedures resulted in information supplied by knowledge- 
able fisheries biologists for a national probability sample of reaches. It 
remained to assemble this information so that unbiased estimates of se- 
lected variables for the national population of reaches could be obtained. 


The estimation procedures, although straightforward enough in concept, 
were complicated mathematically. These complications were partly nota- 
tional, as is made clear in Appendix E, which presents the estimation equa- 
tions. The complications were also a result of the survey design. 


As described in the previous sections of this report, the final design 
was a stratified, two-stage design. There were 14 strata, of which 13 en- 
tailed a first-stage sample of cataloging units from which a second-stage 
sample of reaches was selected. For the remaining (large river) stratum, a 
single stage of sampling selected reaches from the frame (total population) 
of all qualifying reaches. Because of this difference in sampling proce- 
dures, the forms of the equations (or terms in the equations) for the large 
river stratum were different from those of the other strata. 


WEIGHTS 


The estimation procedures assembled the response variables of interest 


. in a weighted-average or ratio sense. The calculation of the weights to be 


associated with each sample reach is described in Appendix E. Essentially, 
the weight assigned to a reach was the reciprocal of the expected frequency 
with which it was selected according to the sampling procedure. 


MISSING DATA 


Because it is generally expected that not all the planned data will, 
in fact, be collected in surveys, procedures were needed to adjust the na- 
tional estimates for these missing data. To this end, response variable 
values missing from certain of the sampled reaches were replaced (esti- 
mated) by a “class average." This was the weighted average of all other 
sample reaches in the same cataloging unit for the first 13 strata or the 
weighted average of all other sample reaches in the stratum for the large 
river stratum. 
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REPLICATION 


The original two-phase design included replication in the first phase 
to provide unbiased variance estimates for estimators of the differences 
between the phase-one and phase-two results. Although the second phase was 
not implemented, the replication feature already in place was retained. 
Consequently, a choice of estimation procedures was available: either use 
the replicates or pool the data without regard to replicates. The latter 
procedure was actually used, in anticipation of its leading to a more effi- 
cient variance estimation (both approaches yield the same parameter esti- 
mates). However, both variance estimation procedures are described in Ap- 
pendix D. 


ESTIMATED PARAMETERS 


Depending on the variable, different types of parameter estimates 
might be of interest. One type is an estimate of a population total. For 
example, it might be of interest to estimate the total mileage of reaches 
in the Nation. Such a total is calculated as the sum of the mileages of 
the reaches in the sample, each having first been multiplied by its sam- 
pling weight. 


A second type of estimate of common interest is a mean or proportion. 
Simple examples would be the mean length of a reach, nationally, or the 
proportion of reaches in the Nation that support sport fish. In either 
case, the estimation procedure requires calculation of the ratio of two es- 
timates. In the first example, the estimated total mileage of reaches 
would be divided by the estimated (because it is not known absolutely) num- 
ber of reaches in the population. The second example entails the ratio of 
the estimated number of reaches that support sport fish to the estimated 
total number of reaches. 


It may also be of interest to estimate regression relationships be- 
tween the variables. In its simplest form, such a relationship can be ex- 
pressed as: 


y = px +e 


where x is the independent or predictor variable (e.g., whether or not the 
presence of toxic substances in the reach is an adverse condition of major 
concern) and y is the dependent or predicted variable (e.g., whether or not 
the reach supports sport fish). In this example, B is the regression coef- 
ficient to be determined from the sample data (the y's and x's), and e is 
an indication of the degree to which the regression prediction fails to be 
exact. The example can be generalized to the extent that there may be many 
predictor variables (x's) to be considered simultaneously, in which case B 
is a series (vector) of coefficients. 
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Specific equations are given in Appendix D for calculating all of the 
above types of parameter estimates (i.e., totals, means, proportions, and 
regression coefficients). Also presented in Appendix D are the relation- 
ships for estimating the variances of these parameter estimates. The vari- 
ance calculation associated with totals is exact. The other parameter es- 
timates all involve ratio estimates, which are nonlinear statistics. 
Therefore, a linear approximation is suggested. 
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APPENDIX A. SAMPLE SIZE RELATIONSHIPS 


This Appendix presents the mathematical relations needed to determine 
the sample sizes. Denote by: 


u{(q), g=1, 2, ..., N 


the units (i.e., the reaches) in the population. The parameters of central 
interest are the proportions, P, defined by: 


N 
2 Y(g) 

pd 
2 L(g) 
g=1 


where Y(g) = that part of the total length of reach g that exhibits the 
specified characteristics (i.e., belongs to the domain of 
interest) 
L(g) = the total length of reach g 


Estimates of the parameters defined by this equation are nonlinear 
statistics. To avoid the complication introduced by this fact, define as 
an approximation: 


N 
P== 5 &(g) (1) 
g=1 
= 6 


l if reach g belongs to the specified domain 
of interest 


where 6(g) 
(2) 


0 if reach g does not belong to the specified 
domain of interest 
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The variance of the proportion in the population is defined by: 


1 N . 
o2 == & [8(g) - 5] 
g=1 
= P[1-P] (3) 
_ Nel 
=n 


An observation, x(g), is obtained using the phase-one data collection 
procedure and is used to classify reach g as to whether or not it belongs 
to the domain of interest. These observations take the form: 


x(g) = 5(g) + a(g) + e(g) (4) 


where a(g) expresses a "biologist" effect, due to the level of knowledge 
concerning reach g (in the population) that is available to the biologist 
designated to respond for that reach, and e(g) reflects any variability in 
x-values obtained from the designated respondent for reach g, in the con- 
text of obtaining repeated observations of x(g). 


Taking expectations in the direction of repeated observations, denoted by 
the symbol, &: 


@ 


E— {e(g)} = 0 (5) 
iN 
o=y 2 & {e2(g)} (6) 
ee 
1 N 
op. Fass «6 > € {e(g)e(g')} (7) 
e*e N{N-1] g=1 g'¥#g @ 


That is, in the sense of collecting repeated observations of an x-value for 
a given reach from the same designated respondent, the different e-values 
contribute to the total variability but do not introduce bias. 


On the other hand, the a-values do not vanish in the context of re- 


peated sampling. Consequently, the following quantities can be defined at 
population levels: 
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- ] 
a = N > a(g) (8) 
g=1 
, * = 
of = 2. [a(g) - a]? 
g=1 
-~ Nl op 
~ ON = (9) 


Covariances among the a-values for different reaches are assumed not to 
exist. 


The mean of the x-values in the populatior of reaches is given by: 


_ N 
X= = 2 x(g) 
g=1 
- = 1 N 
= 6+a+) 2 e(g) (10) 
g=1 
with expectation: 
£{X} =6 +4 (11) 
@ 


Using the phase-two data collection procedures, an observation of the 
form: 


y(g) = 5(g) (12) 
is obtained. In this connection, observations of the form: 
d(g) = x(g) - y(g) (13) 
assume some importance. 
As a part of the phase-one data collection procedures, the designated 


respondents supplying the information were asked to classify their re- 
sponses into categories that could be described as: 
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(1) definitely yes or no 
(2) probably yes or no 


(3) unknown 


Conceptually, each unit in the population could be classified into one of 


these three categories. 


At population levels: 


Lo 
i 
Zzihv 
mM 
Qa 
“~~ 
© 
~~ 


Denoting the response categories or post-strata described above by: 


j =1,2,3 


equation (14) can be rewritten as: 


- l 3 - 
D=, 2 NCJ) OC) 
j=l 
3 : N 
=5 2 Nj) aj) +5 & eg) 
j=l g=1 


where N(j) = the size of post-stratum j in the population 


a(j) = way = a(g), with the symbol = denoting 
gej géj 


summation over all units in post~-stratum j 


(14) 


(15) 


Note the implication in equation (15) that the imposed stratification in- 
fluences the bias structure but not the random measurement error structure. 


The expectation of equation (15) is: 
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é {D} =a (16) 


Similarly, the post-stratum level variances are defined at population 
levels by: 


$2409) = aS F, tao) - OY (17) 





with expectation: 


g1S4D} = & CADET (aa) - D(j)}2 e(1), e(2), ..., e(N)} 
+ Var { may = d(g) e(1), e(2), ..., e(N)} 
w gej 
o2 
= $2 (j) + 02, [l-p,] + = C1+{N-1]p,] (18) 


The above follows the development as presented by Konijn (1973). 


Under the proposed survey design, the x-values were obtained for a 
total of n sample reaches using a stratified, two-stage area sampling de- 
sign. The respondent-supplied classification of responses was used to con- 
struct the: 


j=1,2,3 
post-strata identified above. A second-phase subsample of m reaches would 
be selected from within the post~strata without regard to the area design. 


An unbiased estimate of the population parameter, P (or 6), would be pro- 
vided by the difference estimator: 


P = ¥(m) + X(n) - X(m) (19) 


where Y(m) = the estimate based on the information obtained from the 
m units in the subsample using the phase-two data col- 
lection procedures 
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the estimate based on the information obtained from the 
n units in the full sample using the phase-one data 
collection procedures 


X(n) 


X(m) the estimate based on the information obtained from the 


m units in the subsample using the phase-one data col- 
lection procedures 
Note that by defining: 
d(g) = x(g) - y(g) 


over all the units in the subsample, equation (19) can be rewritten as: 


P = X(n) - D(m) (20) 
Let: 
a) = the observed relative size of post-stratum j 
m(j) = the size of the phase-two subsample selected from post- 


stratum j 
As a simplifying approximation, take: 
n= nn, 
That is, the first-stage sample consists of n, first-stage cataloging 


units, each of size n, second-stage reaches, with equal probability selec- 
tion of units at each stage. Given the simplified design: 





a 1 n 
X(n) = mo al x(g) (21) 


with the symbol 2 denoting summation over all units in the phase-one sam- 
ple and: gel 


D(m) = 
j 


tM Ww 


ni) 5 aa) (22) 
£ 


The variance of the approximation of P as given in equation (1) can be 
written as: 


Var{P} = & {Var {P}} + Var {& {P}} (23) 
co @P 0° P 
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where the symboi, PD, denotes the operation over the probability structure 
of the design. 


Consider first: 
EP} = EL ELE {P}} } 
1 2 II 


where the Arabic numerals denote the operation taken conditionally over the 
stages of the area sampling design and the Roman numerals denote the oper- 
ation taken conditionally on the first-phase information. This is: 


=z 


N os 
z e(g)]- (0+ 5 =f e(g)] 
g=1 g=1 


i 
an) 
=<! 
+ 
Zin 


E {P} 


= 6 
Hence, the last term in equation (23) is Var{S} or Var{P}, because the ex- 


ie) ie <) 
pected value of the random measurement error component of the variance 


cancelled when taken over the design. 
Consider, next: 


Var £P} = € {Var {P} } + Var {& {P} } 
I Il I II 


where £ {P} = & {X(n) - D(m)]x(1), x(2), ..., x(n)} 


II 


For the simplified area sample design: 


o2 





Var f¥(n)} = 


(1 + [n, - 1] pe.) 
I 


nin, 


where p_ is the correlation among reaches in the same cataloging unit. The 
expression assumes that the phase-one sampling fractions were vanishingly 
small. 
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On the other hand: 


Var {P} = Var {X(n) - D(m) x(1), x(2), ..., x(n)3 
II 
— nc) | 4 nhJ/ 
; | n 4 m(j) 


if terms in m(j)/N(j) are ignored. Kendall and Stuart (1966) give an 
example of this. The expected value: 


“ reas 2 $2(}) 
- N(j)| =a’ 
[oie GUN SD 


if the phase-one sample is large enough that 
+ (MCD; fl - M2; +0 


as demonstrated by Kendall and Stuart (1966) and if the area clustering 
effect operates only on the 6-values and not on the unit level biases. 


Hence, the variance given the design is taken as: 





;. 0% 2 $4(j) 
: N(j)} Paks? 
Var{P} mn, (1 + [n,-1] P, ] + UN NCD | mm) (24) 
Finally, taking the expectation of equation (24) over the random measure- 


ment error dimension gives: 


oF S*(j) 0 


a 1 
var {P} = =F (1+{ng-1] pl + z (NCD ty te Cee] (25) 
J 








taking m(j) =m €& ee } in order to evaluate the last term. 
I 


The rest of this development follows that presented by Kendall and 
Stuart (1966). The central notion is to rewrite equation (25) in the form: 
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with the Roman numerals denoting the phase of sampling, and to solve for 




















the ratio: 
7 _ 
' 1 “ry 
By taking: - 7 
— NCJ) S.C) (27) 
mj) =m> 
> N(j)S4C3) 
j=1 
fs ~ 
_ C, l-p 1/2 
2] \Ps 
. 7 
where C, = the per unit cost of a first stage sampling unit 
C, = the per unit cost of a second stage sampling unit 


the values: 


<= 
" 


I ° [1+[n,-l]p_] 


_ NG) pay? 
Yip = 2 We Sqg6D? 


are obtained and equation (26) is solved for the ratio m/n. 
A variance constraint is imposed by requiring: 


A 


Var{P} < [AP]? (29) 


where A is the relative standard error of the estimate. Given the values 
V., V,,, m/n and AP, the minimum sample size needed to satisfy the variance 
constraint is given by: 


1 | Vy ; 


I 
=. (30) 
[AP ]}2 I ta C, | 7 
C 
II | 





n 








— 


BEST COPY AVAILABLE 











APPENDIX B. SAMPLE SIZE CALCULATIONS 


Even though it is not necessary to present all the equations here 
(they are given in Appendix A), a certain amount of notation is helpful 
to the understanding of the sample-size calculations. 


The population or universe of interest is defined in terms of river 
reaches shown in U.S. Geological Survey maps scaled 1 inch to 500,000 
inches Conceptually, they can be numbered by: 


g=1, 2, ..., N 


where N is the total number of reaches in the population. 


The (population) parameters of major interest to the survey can be de- 
scribed in terms of the proportion of either the total number of reaches or 
total miles of reaches nationally having specified characteristics. The 
sample-size estimates were developed in terms of the proportion of reaches 
(not miles) having the specified characteristics to avoid the complications 
introduced by nonlinear statistics. That is: 


1 N 
P = N 2 6(g) 
g=1 
where 6(g) = 1 if reach g belongs to the specified domain of interest 


(e.g., it does support sport fish), 
= 0 if reach g does not belong to the specified domain of in- 


terest (e.g., it does not support sport fish) 


The approximation is equivalent to assuming all reaches to be of equal 
length, given the domains of interest, and allows the development to pro- 
ceed on the basis of sampling-unit means and variances. 


An observation, x(g), obtained from the phase-one data collection pro- 
cedure was used to classify reach g in terms of whether or not it belongs 
to the domain of interest. These observations take the form: 


x(g) = 5(g) + a(g) + e(g) 
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where a(g) expresses a “biologist” effect, due to the degree of knowledge 
concerning reach g (in the population) available to the biologist desig- 
nated to respond for that reach, and e(g) reflects any variability in x- 
values obtained from the designated respondent for reach g, in the context 
of obtaining repeated observations of x(q). 


It is assumed that the different e-values contribute to the total 
variability but do not introduce bias (i.e., they would tend to average 
zero with repeated observations of an x-value for a given reach from the 
same designated respondent). 


On the other hand, the a-values would not vanish with repeated sam- 
pling. Aithough the respondent selection procedure was viewed as ensuring 
that the biologist most knowledgeable about a given reach was the desig- 
nated respondent, this provides no guarantee that the designated respondent 
necessarily possesses accurate, or indeed any, information about the reach 
in question. Consequently, the a-values contributed to the total variabil- 
ity and may also have introduced bias. 


While it appears likely that the same respondent might be designated 
for a set of reaches, thereby possibly inducing covariances reflecting the 
biologist's attributes, there appears to be no way of including such co- 
variances in the model without the knowledge of the association of reaches 
with biologists. Therefore, they are assumed not to exist. 


Using the phase-two data collection procedures, an observation of the 
form: 


y(g) = 6(g) 


is presumed to be obtained. Because of the cost differential 


Cy << Cry \ 
where Cy = the cost of obtaining an observation, x(g) (i.e., the per 
unit cost of the phase-one data collection procedure) 
Cry = the cost of obtaining an observation, y(g) (i.e., the per 


unit cost of the »hase-two data collection procedure) 


the survey could not be undertaken using the phase-two data collection pro- 
cedures alone. Rather, the costly phase-two procedures were to be used to 
correct the measurement biases associated with the less costly phase-one 
procedures. In this connection, the differences of the form: 


d(g) = x(g) - y(g) 
assume some importance. 


As a part of the phase-one data collection procedure, the designated 
respondents supplying the information were asked to classify their re- 
sponses into categories described as: 
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(1) definitely yes or no 
(2) probably yes or no 
(3) unknown 


Conceptually, information about each reach in the population could be 
classified into one of these three categories. 


It is noted that x-values are not directly available for units in the 
“unknown" category. Compensation procedures based on the available infor- 
mation are commonly used to obtain estimates of parameters describing the 
nonresponding or missing data component of the population. The various 
compensation procedures take the form of substituting particular means, 
computed using the available data, in place of the missing observations. 
In this sense, x-values can always be supplied for units in the “unknown" 
category although the bias properties will generally be dissimilar for this 
group. Nonetheless, the development cam proceed as though x-values were 
available for all units by including some compensation procedure as a part 
of the total phase-one data-collection procedure. 


Under the proposed survey design, the x-values would be obtained for a 
sample of n reaches from the total population, N. The respondent-supplied 
classification of responses would be used to construct three categories or 
“post-strata." A second-phase subsample of m reaches then would be se- 
lected from within the post-strata without regard to the first-phase de- 
sign. 


Denote by: 
N(j), the (unknown) size of the post-stratum j in the population; 
n(j), the observed size of post-stratum j in the sample; and by 


m(j), the size of the phase-two subsample selected from post- 
stratum j. 


As a simplifying approximation, take: 
n= n,n, 


That is, the first-stage sample consists of n, first-stage cataloging 
units, (CU's), each with n, reaches to be selected in the second stage. 


The calculations require that a few more factors be examined. One of 
these factors arises because it is likely that reaches within a CU are more 
alike than are reaches from different CU's. Because the design “clusters” 
reaches by CU's, the correlation among reaches in the same CU, P.» is in- 
troduced. 








Another set of costs is: 


the cost of selecting and using each first-stage sampling unit 
(CU) 


C, 


C, the cost of selecting and using each second-stage sampling unit 


These two factors reflect the cost differences between the two stages of 
the design, as contrasted with C, and C,,, which correspond to the two 
phases. I i 


Of considerable importance to the allocation problem is the probabil- 
ity with which reaches would be correctly classified in stage one in each 
of the three post-strata. Again, if an x(g) or y(g) observation can take 
on the value of one or zero, according to whether the reach is reported as 
belonging to or not belonging to the domain of interest, the possibilities 
are: 


{y(g)=1, x(g)=1} 
{y(g)=1, x(g)=0} 
fy(g)=0, x(g)=0} 
{y(g)=0, x(g)=1} 
with: 
k(g) = Prob {y(g)=1, x(g)=1} + Prob fy(g)=0, x(g)=0} 
specifying the probability of a correct classification. 


In general, the probability of a correct classification for the survey 
as a whole is determined by: 


3 , 
k= x MD Kj) 
- 


J 


where the K(j) are the probabilities of a correct classification for the 
three post-strata. It is likely that: 


0 < K(3) < K(2) 
that is, the probability of being able to synthesize correct classifica- 
tions in the case of missing information is less than the probability of a 
correct classification in the probable post-stratum. A reasonably opti- 


mistic position was taken, that K(3) was somewhat greater than the midpoint 
of this range, namely: 


K(3) = 2/3 K(2) 
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Finally, the parameters and resulting sample sizes were constrained by 
the requirement that the variance of the expected value of P, estimated by 
the survey, not exceed: 


[aP }? 


where A is the relative standard error of the estimate. 


TRIAL CALCULATIONS 


A series of trial calculations was carried out to investigate the sen- 
Sitivities of the factors on the sample sizes and allocations. Table B-1l 
gives the values of the design factors used in the trials; Table B-2 pre- 
sents the sample sizes that resulted. The last column of Table B-2 (py ) 
is the correlation between the first phase, x(g), and the second phase, 
y(g), values. 


Trials 1 through 5 explored the sample size and allocation conse- 
quences of gradually relaxing the performance standards, in terms of de- 
creasing the K(1) and K(2) values of the phase-one data collection proced- 
ures while the remaining design factors were held constant. Trials 6 
through 13 explored the relationship between sample sizes and the relative 
sizes of the post-strata, N(j). 


In trials 14 through 17, the influence of the intracluster correla- 
tions, due to the two-stage area sample, was explored. The p-value (col- 
umn 6 of Table B-1) expresses the correlation, with respect fo the obser- 
vation variables of interest among river reaches in the same CU, averaged 
over the CU's in the first-stage area frame. It seems reasonable to expect 
positive correlations among reaches in the same CU. However, the magnitude 
of these correlations for the set of questionnaire items was unknown. 


Trials 18 through 20 explored the influence of changing the cost ra- 
tios, C,/C,, on the allocation of the first- and second-stage samples. 
Trial 19 was based on the assumption that per-reach costs were twice the 
per-CU costs, while trial 20 was based on the assumption that the per-CU 
costs were twice the per-reach costs. Similarly, trials 21 and 22 involved 
the cost ratio per reach between the phase-one and phase-two data col lec- 
tion procedures. Although the phase-two data collection procedures were 
not specified, they were expected to cost considerably more on a per-reach 
basis than the phase-one procedures. Accordingly, cost ratios of 1:5 and 
1:10 were used. 


Trials 23 through 26 showed the influence of changing the precision 
requirements for estimates of different population proportions. Trials 23 
and 24 required a relative standard error of 10% and 5%, respectively, for 
estimates of a population proportion of 0.10 (i.e., a domain consisting of 
10% of the total population of reaches). Trials 25 and 26 imposed the 5° 
precision requirements for estimates of a population proportion 
(i.e., 25% of the population). 
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Table B-1. Trial calculation specifications 





Design factors 











; C C 

Trial N(1)#+N(2) NQ1) -l_ iilad 
no. P N N KC) K(2) Pa . Cry C. 
1 0.10 0.90 0.25 1.00 1.00 0.01 0.10 0.20 £1.00 
2 0.99 0.90 

3 0.95 0.80 

“ 0.90 0.70 

5 0.90 0.67 

6 0.10 0.90 0.25 0.99 0.80 0.01 0.10 0.20 1.00 
7 0.80 

8 0.70 

9 0.60 

10 0.90 0.15 

ll 0.80 

12 0.70 

13 0.60 

14 0.10 0.60 0.15 0.99 0.80 0.01 0.10 0.20 1.00 
15 0.05 

16 0.10 

17 0.20 

18 0.10 0.60 0.15 0.99 0.80 0.05 0.10 0.20 1.00 
19 0.50 
20 2.00 
21 0.10 0.60 0.15 0.99 0.80 0.05 0.10 0.20 2.00 
22 0.10 
23 0.10 0.60 0.15 0.99 0.80 0.05 0.10 0.10 2.00 
24 0.05 
25 0.25 0.10 
26 0.05 
27 0.20 0.60 0.15 0.99 0.80 0.05 0.10 0.10 2.00 





P = Population proportion estimate. 

N = N(1)4+N(2)+N(3), where the N(j) are the sizes of the 3 post-strata. 
K(j) = the probability of a correct classification for post-strata j. 
po. = intracluster correlation (reaches in same cataloging unit). 

A°= relative standard error of survey estimate of P. 

C = cost ratio, first-phase to second-phase samples. 

C 


— cost ratio, first-stage to second-stage samples. 
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Table B-2. Sample size and allocation results 











Overall 
Trial Sample sizes correlation 
no. n n, ns m m( 1) m(2) m(3) Pry 
1 1,250 125 10 67 0 0 67 0.902 
2 2,500 250 38610 764 71 554 139 0.771 
3 3,040 304 10 1,258 #£«188 898 172 0.692 
4 3,370 337. 10 1,618 £287 1,140 191 0.644 
5 3,410 341 10 1,668 291 1,184 193 0.635 
6 2,870 287 10 £1,092 81 848 163 0.699 
7 2,930 293 10 £1,147 83 732 332 0.680 
8 2,980 298 10 £1,201 84 610 507 0.664 
9 3,040 304 10 £1,258 86 483 689 0.650 
10 3,040 304 10 #£=1,261 52 1,037 172 0.685 
ll 3,090 309 10 £1,319 53 915 351 0.668 
12 3,150 315 10 £1,377 53 788 536 0.654 
13 3,210 321 10 £1,437 54 656 727 0.642 
14 3,210 321 10 £1,437 54 656 727 0.642 
15 3,064 766 4 1,453 30 663 735 0.642 
16 3,399 1,133 3 1,459 55 666 738 0.642 
17 3,400 1,700 2 1,459 55 666 738 0.642 
18 3,064 766 4 1,453 55 663 735 0.642 
19 3,132 1,044 3 1,441 55 657 729 0.642 
20 3,414 569 6 1,471 56 671 744 0.642 
21 3,414 569 6 1,471 56 671 744 0.642 
22 4,368 728 6 1,332 50 608 674 0.642 
23 4,368 728 6 1,332 50 608 674 0.642 
24 17,484 2,914 6 5,328 202 2,431 2,695 0.642 
25 840 140 6 lll 4 51 56 0.888 
26 3,354 559 6 443 17 202 224 0.888 
27 1,308 218 6 223 8 102 113 0.830 





n = number of reaches in sample (phase I). 

n, = number of cataloging units in first-stage sample. 

n, = number of reaches per cataloging unit in second-stage sample. 
m = number of reaches in sample (phase II). 

m(1)-m(3) = number of sample reaches in each strata for phase II. 
Pxy = correlation between phase-I and phase-II observations. 





Trial 27 involved the final recommendations for the factor values to 
be used in designing the survey. 


TRIAL CALCULATION RESULTS 


Trials 1-5 





The best of circumstances, involving no misclassification of either 
the definite or the probable post-strata samples and a missing information 
problem of little or no consequence, is reflected in trial 1. Under this 
circumstance, the correlation between the values of x(g) and y(g) would 
optimistically exceed 0.90. As the probabilities of misclassification er- 
rors in phase I increase, the size of the phase-two sample increases dra- 
matically for quite small increases in misclassification rates, with a 
concomitant drop in the overall correlation. The trial calculations sup- 
ported the conclusion that the phase-one data collection procedures must 
perform well, producing overall correlations between the first and second 
phase results of better than 0.75-0.80. If correlations fall much below 
these levels, arguments can be made for using phase-two procedures alone, 
even in the face of sizeable cost differentials between the two procedures. 


The definitely yes or no post-stratum probably contains reaches that 
have been intensively studied at the local level. The studies performed, 
however, are unlikely to be of uniform quality nationally and may have had 
informational requirements different from those for this study. It would 
seem that some opportunity for misclessification should be afforded in this 
Stratum. However, if the phase-one data collection procedures are at al] 
appropriate, the probability of misclassification in this stratum must ob- 
viously be small. According!y, the final design was based on the assump- 
tion: 


K(1) = 0.99 


On the other hand, the probably yes or no post-stratum is seen as con- 
taining reaches that have not themselves been studied, but which form parts 
of river systems that have been studied at other locations. The responses 
obtained for these reaches were inferred from the information available 
about the river system as a whole. While the opportunities for misclassi- 
fication errors were correspondingly increased, the information obtained 
must still be generally correct. Probabilities of correct classification 
on the order of: 


K(2) = 0.80 
would seem to be required for the phase-one procedures to have merit. 


Trials 6-13 





The performance of the phase-one data collection procedure was influ- 
enced by the relative sizes of the post-strata in the population, as well 
as by the misclassification probabilities in each. Of particular concern 
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was the size of the missing information post-stratum using these proced- 
ures. Trials 6 through 9 and 10 through 13 show the sample size implica- 
tions of increasing the relative size of the missing information stratum 
from 10% to 40% of the population. In the first set of trials, the def- 
inite post-stratum is fixed at 25% of the total; in the last set, it is 
fixed at 15%. 


As indicated both by the changes in sample sizes and the changes in 
overall correlations, the allocation solutions are relatively insensitive 
to changes in post-stratum sizes over the ranges in sizes for the high K- 
values used in the calculations. That is, the phase-one procedures can 
tolerate a fairly high missing data rate if the misclassification errors in 
the other two post-strata are few, which seems reasonable to expect. Given 
that sample reaches were selected without reference to the importance of 
the reach from a sport fishing point of view, the missing information 
stratum might be expected to be quite large, perhaps on the order of the 
40% used in trials 9 and 13. For the same reason, the size of the definite 
stratum, corresponding to reaches that have been extensively studied, might 
be expected to be quite small, perhaps on the order of 15% of the popula- 
tion (trials 10 to 13). These values were recommended as design factor 
values. 


Trials 14-22 





The concept of intracluster correlations, despite its prominence in 
statistical literature dealing with sampling, seems not to have been stud- 
ied in field investigations of fisheries. Of particular interest to this 
survey is the average correlation (or covariance) that might be expected 
among river reaches in the same cataloging unit for different questionnaire 
items. In general, as these correlations increase, that is, as reaches in 
a cataloging unit more closely resemble each other, the minimum variance 
sampling strategy moves in the direction of requiring a larger sample of 
cataloging units, but with fewer observations in each unit. This result is 
demonstrated by the changes in n, values versus n, values in response to 
increasing the magnitude of the intracluster correlation, p. (trials 14 to 
17). For values of the intracluster correlation above 0.20 where the cost 
ratio, C,/C,, is close to unity, second-stage sample sizes are reduced to a 
single reach per cataloging unit. 


The influence of the cost ratio, C,/C,, was demonstrated in trials 18 
to 20. Increasing one cost relative to the other directs the solutions 
toward taking fewer of the more expensive units. Likewise, the influence 
of the cost ratio, C,/C I” associated with the ph»se-one versus the phase- 
two data collection chsh! also suggests taking fewer of the more expensive 
units, as is demonstrated by trials 21 and 22. 


In the absence of any evidence to suggest the magnitudes of the intra- 
cluster correlations likely to be experienced and the per-unit costs ex- 
pected to be incurred, the results in Table B-l suggest a fairly large 
first-stage sample of cataloging units, each with relatively few sample 
reaches. 
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Trials 23-26 





The trial calculations to this point have all required a relative 
standard error (A) of 10% for estimates of a population proportion (P) of 
0.10. The minimum sample sizes required to achieve this level of precision 
seem unattainable, given the budget constraints under which the survey 
would need to be operated. Requiring even greater precision (e.g., a 5% 
relative standard error for estimates of proportions of 0.10, trial 24) is 
certainly unattainable, because essentially all cataloging units in the 
first-stage frame are included in this trial. (Note, in this respect, that 
the finite population effect was ignored in the theoretical development, 
leading to the n, value in trial 24 exceeding the number of cataloging 
units available. The result, nonetheless, demonstrates the point). 


However, requiring a 10% relative standard error for estimates of 
population proportions of 0.25 does seem feasible, as indicated by the 
sample sizes computed in trial 25. The overall correlation between the x 
and y values is also increased, given the large P-value. Requiring a 5% 
relative standard error for estimates for the larger P-values (i.e., trial 
26) also seems out of reach, given budgetary constraints. 


Recommended Design Values, Trial 27 





The recommended design values, following the above discussion, are 
Summarized in trial 27 in Table B-l. It is recommended that sample sizes 
sufficient to provide a relative standard error of A = 0.10 for estimates 
of population proportions of P = 0.20 be used. 


The precision requirements are restated in more familiar terms in 
Table B-3. The table lists the approximate 95% confidence intervals ex- 
pected with National estimates of selected population proportions. The 
table was generated using twice the standard error of the estimate to ap- 
proximate the interval, and assuming a total design effect of 3.27. The 
design effect can be thought of as the ratio of the recommended tota! sam- 
ple (i.e., 1,308 reaches) to the simple random sample size required to ob- 
tain the same standard error. 


To achieve these precision levels, the phase-one data collection pro- 
cedures must perform very well indeed. Specifically, misclassification er- 
rors should be essentially nonexistent in the definitely yes or no post- 
stratum, while the probability of misclassification errors in the probably 
yes or no stratum should be 20% or less. The combined sizes of these 
post-strata is expected to be 60% or more of the total. 
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Table B-3. Approximate 95% confidence intervals for 
estimates of selected population proportions 











Population proportion Approximate interval estimate 
0.01 less than 0.02 
0.05 0.03 - 0.07 
0.10 0.07 - 0.13 
0.15 0.11 - 0.19 
0.20 0.16 - 0.24 
0.25 0.21 - 0.29 
0.30 0.25 - 0.35 
0.35 0.30 - 0.40 
0.50 0.45 - 0.55 
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APPENDIX C. FIRST-STAGE SAMPLE OF CATALOGING UNITS 


This Appendix describes the format of the sample file of cataloging units 
selected in the first-stage sampling as well as the format of the frame file from 
which the sample was selected. The Appendix also lists the data describing the 
first-stage sample, in Table C-1, with the column headings keyed to the format 
description. 


SAMPLE FILE 

The first-stage sample listing is contained in data set 
RTI.A25.P022050.CHB.SAMPLE on a 9-track tape RA1117, 6250 BPI. The 
record length is 103 and the blocksize 12978. The format is as follows: 


Key Column(s) Format Description 








A 1-8 2A4 Cataloging unit code 
B 10-14 A4,Al Ecoregion province 
C 16 Il Ecoregion domain code 
2 = 2000 or 4000 (humid) 
3 = 3000 (dry) 
D 18 Il River type code 
1 = “large river" 
2 = other 
E 20 Il Rural-urban code 
1 = rural 
2 = urban 
F 22-29 F8.2 Size measure (length of river 
reaches in inches, at 1 to 
500,000 scale, nearest 
1/4 inch) 
G 31-38 F8.4 Percent irrigated cropland 
H 40-47 F8.4 Percent nonirrigated cropland 
I 49-56 F8.4 Percent range land 
J 58 Il Irrigated percentage code 


(within urban-rural x 
ecoregion domain combination) 
1 = low 
2 = high 
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Key Culumn(s) Format Description 








K 60 Il Nonirrigated percentage code 
(within urban-rural x 
ecoregion domain x irrigated 
code combination 

1 = low or only 
2 = higher 
3 = highest 

L 62 Il Range land code 
(within urban-rural x eco- 
region domain x irrigated 
code x nonirrigated) 

1 = low or only 
2 = high 

M 64-65 12 Stratum 

N 67 Il Replicate no. 

0 69 Il Zone (order in which selected) 

Pp 71-85 E15.8 Expected number of times to have 
been selected 

Q 87 Il Number of times selected 

*R 89-103 E15.8 Weight (reciprocal of expected 


number of times to have been 
selected, in columns 71-85) 


"0" represents a “missing value" for items not applicable to 
units selected with certainty. 


FRAME FILE 


The sanPling frame used for selection of the first-stage sample is 
contained in data set RTI.A25.P022050.CHB.FRAME on a 9-track tape RA111/7, 
6250 BPI. The record length is 103 and the blocksize 12978. The format is 
the same as for the sample file, except for columns 87 to 103. For the 
frame file, columns 89 to 98 contain the random number used in ordering 
within the stratum (format F10.8); columns 99 to 103 are ignored. 
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Table C-1. First stage sample of cataloging units. 

No. Ae B Cc D F G ~ I JK LM NO p Q R 
2 05100201 2211 #2 2 3 42.75 0.00334 304134 0.7681 12 #12 2 2 2 1 660208608 1 47.938 
2 64010252 2112 2 2 i 36.60 6.60353 1.9995 0.1723 1 i 1 i 1 2 0.0175667 1 56-926 
3 17020004 M2111 2 2 1 43.00 12095108 201976 18.3875 1 1 1 i 1 3 0.0209824 1 487.659 
4 1801212 ¥2610 2 2 i 36.00 0-38672 0.8415 8.1425 i i i i i & 0.0175667 1 56.926 
5 17010101 ¥2112 2 2 1 82-25 0.71573 222935 22-0825 1 i 1 1 2 1 0.040135" 1 24.916 
6 18010199 2412 2 2 1 13.25 1.60994 5.86121 23-1309 1 i 1 i 2 2 ¢.0064655 1 154.667 
7 61070003 2214 2 2 1 51.75 6.95994 3.7384 0.3758 i 1 i 1 2 3 0.0252521 1 39.601 
e 07010105 2111 2 2 1 $5.25 0.06991 4.8227 0.7609 1 1 1 1 2 a 0.0464785 1 21.515 
S 1202C006 2320 2 2 1 54.00 £.10502 4.4028 5.9202 1 1 i 1 3 1 ¢.C263500 1 37.951 

ict 08040206 2320 2 2 1 64.25 C-11°80 627820 1.5682 1 1 1 1 3 2 0.0313517 i 31.896 

i1 17920007 ¥2415 2 2 1 7.00 1.33586 223944 19.850? 1 1 1 1 3 3 0.9034157 i 292-762 

12 09030001 2111 2 2 1 114.25 0.00303 1.7307 0.1981 12 2 2 & 3 4 £0.0557498 1 17.°37 

13 1710¢€310 "2415 2 2 1 30.50 1.228380 12-7891 621055 i 1 1 | 4 1 0.-9148829 1 672191 

14 17010207 ¥2112 2 2 1 43.25 1-18132 224652 323655 1 1 i 1 4 2 0.0211044 1 47.3853 

15 1020002 2118 #=2 2 1 62.00 0.01935 166167 0.1013 1 2 2 42 4 3 0302537 1 33.054 

16 0401291 2112 2 2 1 144.00 0-04326 3.8769 0.4872 1 1 | 1 4 4 0.0792668 1 14.251 

17 03170007 2311 #2 2 1 69-25 (C.05797 10.8071 16777 1 2 & 2 & 2 660337915 1 29.593 

le 02050206 2214 #=+2 2 1 86.50 9.02224 18.9728 168939 1 2 2 2 4 2 O60822088 1 23.692 

1$ 01086C107 2114 2 2 i 29-50 0.00878 724382 1.2867 1 2 1 2 1 3 0.01435949 1 69.469 

2€ 08060102 2113 2 2 1 136675 79341 13.9835 1.0682 1 2 1 2 1 4 £60667290 1 14.986 

21 o#osc0ies 2113 2 2 1 35000 0636513 1266629 0.65376 1 2 4 2 2 § £O-0179787 1 58.552 

22 03070105 2311 #2 2 41 33.50 1631618 21-9643 1.7795 1 #2 4 2 2 2 0163468 1 61.174 

23. 17090006 241s 2 2 41 35.75 2616252 15.8634 2.4174 2 2 & 2 2 3 O-0178087 1 57.324 

24 02050205 2113 2 2 41 35.50 0.02350 12168389 166057 1 2 2 2 2 4& O60173227 1 57.728 

25 03096201 2320 2 2 4 153650 628601 24-6311 160141 12 #2 2 2 3 2 O-0789924 1 13.351 

26 @3000108 2320 2 2 1 38.75 0.19504 19.8817 1.5030 1 2 12 2 3 2 0189086 1 520886 

27 03060198 2320 2 2 1 32-75 0.74573 2424249 222553 1 2 1 2 3 3 0.0159808 1 62.575 

28 08080208 2311 #2 2 1 S2.C0 1636807 1069853 160782 121 2 12 2 3 4 60253741 1 39.410 

29 03050207 2320 2 2 41 34.75 (C.1802% 19.8173 1.60254 12 2 2 2 4 a2 06969567 1 58.974 

30 ogose20e 2311 #2 2 41 S200 1636807 1069853 160782 1 2 2 2 4& 2 O69253741 1 39.410 

31 o80ee101 2113 2 2 1 46.00 0.97602 1663167 0.6845 1 2 & 2 4 3 60224463 1 44.551 

32 3160101 2320 2 2 1 91.50 0.02859 2563142 4.0606 1 2 2 3 12 2 O69896987 1 22.397 

33 22060201 2512 2 2 3 67625 0660248 2062676 5365321 1 2 2 3 4 2 CeM328155 1 30.473 

34 11010005 2215 2 2 i 69.5 0.069354 22-9047 10.1936 1 2 2 3 1 3 0.0436727 i 22-898 

35 05020003 2211 2 ? 41 Be0C 0602163 1565153 4263199 1 #2 2 3 2 4 060039937 1 2566167 

36 11140104 2512 2 2 1 42.75 0.57336 20.0480 43.1425 1 2 2 3 2 1 0.928604 1 $7,938 

37 @7o1e10s 2111 #2 2 1 81650 57203 1765824 361798 1 #2 2 3S 2 2 49397699 1 254145 

38 6415303 2115 2 2 1 29-50 0.01156 18.9379 5.5519 1 2 2 3 2 3 0.01435949 1 69.469 

39 0804010) 2320 2 2 } 67.05 "222007 10.6222 3.1910 1 2 2 5 2 4 ®.0326936 i 30.587 

40 02070011 2320 #2 2 43 83.75 0613699 20.9596 2.9224 1 2 2 3 3 1 Oarse6o 1 24.470 

$1 03160108 2320 2 2 1 64.70 C.02619 20-2656 62-0556 1 2 2 3 5 2 0.9312297 | 32221 

#2 12020005 2311 2 2 41 60.00 6.05983 3147155 761926 1 2 2 38 3 3 060292778 1 34.156 

43 12090201 2522 2 2 i 44.25 0.35887 1226699 67-7748 1 2 2 3 3 4 0.0215924 1 46.315 

4a 03060108 2320 2 2 13 72050 [94795 1668333 3.6319 1 2 2 3 4 2 Oe0398899 1 28.662 

45 07010104 2111 2 2 1 €1.59 °.57103 17.5824 3.1798 1 2 2 3 4 2 0.0397693 1 252145 

46 12610005 2311 2 2 1 153.25 1-5€652 8.2762 720509 1 2 2 3 4 3 0.07497804 1 13-372 

47 32010003 2215 2 2 3 09.50 0.06934 22.9047 10.1936 1 2 2 3 4 & £G.0436727 1 22.898 

48 09020314 2111 2 2 1 24.09 *.78060 35-8689 123561 1 5 1 4 i ! 0-9117111 1 A5-.3589 
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Table C-1. continued 
No.  A® BO F G : I 7 
49 10230005 2531 2 2 1 127-25 0.6855 63.0517 4.3008 i 3 1 4 1 2 0.062093 1 16.395 
506 05130103 2215 2 2 1 $5.25 0.0225 33-5566 3.23968 1 3 1 4 1 3 02946479 i 21.515 
51 05120196 2511 2 2 1 69.75 C.6365 7524716 129299 i 3 1 4 1 4 0.034035 1 29.581 
52 07070005 2115 2 2 1 137.¢0 1.9497 34.1241 2.29735 1 3 i 4 2 i 6.066851 1 14.959 
53 07130004 2511 2 2 1 51-50 1.7046 84.5791 0.9975 1 3 1 4 2 2 C-025133 1 39.793 
54 63010203 2320 2 2 1 49.50 9.c19e 2726542 0.7032 1 3 i 4 2 3 0.019765 1 50.691 
55 naiccoce 2212 2 2 1 24.50 C.0¢592 7526652 0.9470 1 3 1 4 2 4 0.011955 i 83.646 
56 05130105 2215 2 2 1 $5.25 0.0225 33.5566 3.3968 1 3 i 4 3 1 02946479 i 21.515 
57 080102Cée 2215 2 2 1 59.¢0 0.0412 41.4667 3.3608 1 3 i 4 3 2 0.028790 i 34.734 
58 97140197 2215 2 2 1 55.¢6 0.7258 34.0653 3.3514 i 3 1 4 3 3 0.025862 1 38.667 
59 09020103 2531 2 2 1 63-00 0.7802 50-1150 3.4659 1 3 1 4 3 4 0-930742 1 32529 
60 07060207 2531 2 2 i 55-25 0.0195 87.3671 224536 1 3 1 a 4 1 0.026960 1 37.2992 
61 07130007 2511 2 2 1 635.00 0.¢258 78.6214 229449 i 3 i 4 4 2 0.939742 1 32529 
62 07040005 2215 2 2 i 25-50 0.3808 3725729 325925 i 3 1 4 7 3 0.012443 j 80.366 
€3 07010107 2215 2 2 1 41.00 1.4425 42-2104 4.6500 1 3 1 4 & a 0.020007 1 49.984 
64 102°0101 2511 2 2 i 71.60 0.4045 46.5047 26-9078 i 3 2 5 1 1 62034645 i 28-f64 
65 10160001 2532 2 2 1 71.59 0.1174 65-7015 26-9639 1 3 ° 5 i 2 0.034889 1 28-662 
66 19280105 2511 2 2 1 97-75 0.1457 63.9199 10.1499 i 3 2 5 i 3 0.047698 1 20.965 
67 10260005 2535 2 2 i 29-25 1.4923 3620513 28.6587 1 3 2 5 1 4 0.014275 } 70.765 
68 1160005 2532 2 2. 1 146.59 0.2166 69-2193 24.3927 1 3 2 5 2 1 *.071487 i 13.989 
69 11070206 2511 2 2 1 23.50 0.1675 36-0550 19.9911 1 3 2 5 2 2 0.011467 i 87.206 
70 10170202 2531 2 2 1 92-75 0.6764 67-9048 15.7315 1 3 2 5 2 3 0.94525° i 222995 
71 09920204 25352 2 2 i 50.50 0.0850 77-8662 16-3129 1 3 2 5 2 4 0.024642 1 40.581 
72 09020109 2531 2 2 i 52-50 0.03°4 85.5420 9.65861 1 3 2 5 3 i 32025618 1 39.935 
73 10160011 2532 2 2 1 183.25 t.3556 6601174 22-9545 i 3 2 5 3 2 0.089419 1 11.183 
74 10160009 2532 2 2 1 78-60 ©.54S53 55.7088 37.8666 i 3 2 5 3 3 0.038061 1 264274 
75 09020107 2531 2 2 i 38-25 0.1482 81.8840 5.9398 i 3 2 5 3 4 9-918665 1 53.577 
76 09020202 2532 2 2 1 47.50 0.1476 73.9400 21.3048 1 3 2 5 4 i 9.023178 1 43.144 
77 66040002 2215 2 2 1 56-25 0.0279 38.4102 5.94091 1 3 2 5 4 2 0.027448 i 36-435 
78 11070105 2511 2 2 1 75.00 0.3606 $¢0.3907 33.6184 1 3 2 5 4 3 0-0356597 i 27-324 
19 05100192 2211 2 2 i 46.59 0.1244 59.1670 10.9658 1 3 2 5 4 4 0.02269¢ 1 94.072 
e0 10270297 2531 2 2 i 72-75 11.4944 60.1601 17.2825 2 1 1 6 i 1 0.055499 i 28.170 
el 10220001 2532 2 2 1 101.75 11.7459 31.9791 44.7442 2 1 1 6 1 2 0.049659 1 20.141 
82 17070106 w2415 2 2 } 36.00 525522 6.1061 20.2508 2 i i 6 i 3 0.017567 1 56-926 
e5 10210005 2532 2 2 1 64.75 15.3868 20.1363 54.3525 2 1 i 6 1 4 G.031596 1 31.659 
e4 11050996 2535 2 2 1 36-00 17.3428 53.8216 2022342 2 1 1 6 2 1 0.017567 1 56°26 
e5 11010007 2215 2 2 1 104.¢0 2.7099 22-5955 2.8670 2 1 i 6 2 2 0.050748 i 19.795 
&6 19250095 2532 2 2 i 36.50 15.0719 386.4792 45.6124 2 i i 6 2 3 0.917811 1 56-146 
e7 18010104 M2414 2 2 1 19.9 4.1782 229869 14.5647 2 1 1 6 2 4 02099271 1 127.869 
e6 17010204 M2112 2 2 i 77-¢0 2+7067 224802 15-3280 2 i i 6 3 1 0.037575 i 26-615 
e9 16050201 "2610 2 2 i 26-0 4.9003 C.otcd 8.0970 2 i 1 6 3 2 0.012687 i 76.821 
99 08080295 23511 2 2 1 33.59 #-1CC0 186.7468 8.4224 2 1 1 6 3 3 C.016347 1 61.174 
91 ososo2c4s 2320 2 2 1 5C.00 5.0850 21-4615 1.6902 ° 1 1 6 3 4 9.024398 1 40.987 
$2 12100102 2512 2 2 1 58.70 52f997 3526904 355-6857 2 i 1 6 4 i 92028302 1 350535 
95 19250016 2555 2 2 1 120.25 14.5911 4F.e2120 24.0497 2 i i 6 4 2 9.058678 | 17.942 
94 03150006 2311 2 2 1 6°.75 28385 30.1564 53-7627 2 i i 6 4 3 0.°29644 1 33.734 
$5 11030005 2535 2 2 1 75.50 122$93?2 5é&.8490 2222225 2 1 1 6 4 4 0.056841 1 27.2143 
36 16060003 3132 $ 2 i 118.50 M.e224 0.0515 15.7806 1 1 1 7 1 i 0.957824 i 17.294 
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Table C-1. continued 

No. Aa B C D CE F G H I JK LM N P Q R 

S? 14010091 3113 3 2 89.50 3.23898 162702 2122587 1 #1 1 7 2 2 e0843673 1 22-8976 
9@ 16047101 3133 3 2 2 154.25 2.00898 0.0968 21.3445 1 1 17 7 12 3S 66075268 12 # £13.2858 
SS 14060003 F3112 3 2 4 220625 3625288 0.4908 22.0221 1 #1 1 7 2 & 6102598 43 9.7471 
if «1604C1098 3133 3 2 4 52.50 2.00898 0.0068 22.3445 1 121 39 7 2 & 06025618 1 £39.0349 
191 6030003 w112 38 2 83.75 2689141 1.7564 10.8999 32 14 1 7 2 2 06090867 1 #£24.9697 
102 1732¢098 3131 3 2 41 40.50 2687770 0.3353 20.2320 1 1 12 7 2 3 6939763 1 £«453-6008 
1°73 16960013 3132 3 2 41 @3.5¢0 0.30535 0.0335 18.5269 1 1 1 7 3 2 06090745 1 24.5429 
194 14060006 3112 3 #2 1 49.00 1.24795 0.4720 18.9675 1 1 1 7 3S 2 0.023919 1 41.8231 
195 a705e1¢09 8693131 3 2 41 35.°C 2.88399 065157 1962250 1 1 1 7 3S 3 06017079 1 #£58.5528 
16 i705C108 3131 3 2 47 38.25 3624902 063362 16.4234 12 #1 1 7 3S 4 6018665 1 £«53.5773 
107 16060007 3133 3 2 4 105.00 1.57405 0.0069 17.1835 12 1 13 7 4  & 06051236 1 #£19.5175 
M6) }«=6 18070003 P3131 3S 2 2 297600 1622679 0.1771 #04491 2 #12 1 7 © 2 066057992 1 1745357 
109 17010212 3112 3 2 JY 16075 2231058 2.6798 1063602 1 #1 1 7 & 3 6037951 1 £26.7914 
119 160940109 3132 3 2 3 87675 2619902 0.0000 20.4269 1 #1 12 7 & 4 0.042819 1 23.3542 
111 432080007 3123 3 2 41 60.00 0.78758 3.3246 77269690 1 #1 2 ® 2 2 06029278 1 34.1556 
112 17970201 3111 3 #2? J 63.75 2620645 0.7872 3765081 1 #1 2 ® 2 2 6031108 1 3241464 
113 10090203 31213 3 2 4 52.00 1.21434 0.4514 83.1721 1 #1 2? 8 1 3 06025374 1 39.4103 
114 15020003 P3132 3 2 4 60.00 6628625 0.2910 25.2444 3 #2 2 8 2 1 66929278 1 34.1556 
115 18080101 3113 3 2 4 267625 3631754 161231 22-9232 1 1 2 ® 2 2 06081612 1 #£12.2531 
116 19050005 3113 3 2 1 42.25 1659616 201981 25.6842 1 #1 2 8 2 3S 066029616 1 48.5249 
117. 43030201 3221 3 2 41 58.75 1422049 066531 54.2990 1 2 8 2 4 #0,028668 1 34,8823 
118 17070201 ™3i12 3S 2 3 63.75 2620645 0.7872 3765081 1 i 2 8 3 42 6031108 1 32.1868 
11S 24080205 P3231 3 2 42 100675 0513544 0.5711 23.8609 1 1 2 ® 3 2 06049162 1 £23.34908 
120 20080003 Asie) 3 2 JY 86625 3620923 0.1256 28.5175 1 1 2? 8 3S 3 0.022568 1 £44,3999 
121 18050003 a8°’2 3 2 4 464.75 1673831 1.0575 35.5134 1 & 2 8 3S 4% 06053119 2 19.5649 
122 20120103 Sia2 3 2 41 51.50 0.95629 3.4057 82.9445 1 #1 2? 8 & 2 6025130 ¢! 39,7929 
123 13060006 F3132 5 2 1 21.90 1.22081 0.4259 70.4415 1 1 2 Pp 4 2 0.010247 3 97.5875 
124 10180006 a3192 3 2 4 4243.00 2607410 0.1861 46.6668 1 #1 2? 8 © %3$ 06069779 3 134.3310 
125 33080201 3212 3 2 41 56.50 0.60545 0.5493 79.5042 1 14 2 8 © © 6027570 3 36.2718 
126 amiseics§ 3112 3 2 43 36625 0.33440 56.4103 385.6182 1 #2 12 9 41 2 6017689 1 56.5333 
127 10090298 3112 5 2 1 $2.25 0.457590 6.8076 71-6586 1 2 1 a 1 2 920945015 1 2222159 
128 10140202 3312 3s 2 4 $7050 CoA38O01 15.6422 74.7685 1 2 1 © 42 3 6047576 1 21.9198 
128 10090209 31113 3 2 4 61.99 ¢.94855 7.2168 69.0366 1 #2 1 9 1 © 060929766 3 33.5956 
13@ 10140201 3112 3 2 2 209650 1649022 1266398 72.3767 1 #2 J 9 2 2 102228 1 9.7820 
131 «10130203 3312 3s 2 4 99.2 0.16263 5563734 41.9863 1 2 1 9 2 2 06048430 3 29.6482 
132 1090208 8 3392 s 2? 4Y 70.25 (C.84656 16.4524 53.5202 1 ? 12 9 2 3 66039279 3 29,1729 
iss 1032112 5112 3 2 1 51.75 C-33253 15-6376 73-3970 1 2 1 9 2 4 0.02525? 1 39.6706 
134 0070007 3122 3s 2? 4 76.00 2629884 10.8531 64.8821 1 2 9 3 4 66037085 1 26.9649 
135 20080101 P3112 3$ 2 & 150.50 3.39726 10.1758 40.1599 1 #2? 3 © 3 2 06973439 4 433.6168 
136 19038c2c«4 3211 5 2 1 31.25 ".30628 54.7475 54.8170 i 2 1 Q 3 5 9.015249 1 65-5787 
13? 30090192 S311 3 2 2 $44.25 4617193 5.9992 78.0274 1 2 43 9 © 2 06070389 1 1462068 
13@ 301216201 3112 SF — 8  UETTS — OH TOHZO =. 8068SE = 90STSS OBC? 9 © 2 6052578 1 19.0193 
135 190133¢2¢5 5112 3 2 1 57.50 C.09988 58.2796 60.7429 1 2 1 9 4 5 °.028058 1 355.6406 
149¢ 20100002 8111 3s 2 4 38.50 91959 5.8293 76.3256 1 ? 1 9 &  & 06018787 1 53.2294 
141 20180019 311s 38 2? 4&4 69025 4647931 0.54941 48.9916 2 1 df 40 F & 66337913 3 29,5933 
142 1025¢€005 54115 5 2 1 103.5" P.51 978 362353535 46.2865 2 1 1 190 1 2 0.059594 | 19.8005 
143 10920002 3112 38 2 4 69.00 6.75257 063129 3768929 2 1 F 20 fF S$ 06933669 3 29,7705 
144 1703¢093 ¥151 5 2 1 163.25 ®.51107 11614676 2661111 2 1 1 19 ] 4 9%.079669 1 12-5535 
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Table C-1. continued 

No. Ae B C D CE F G 4 I JX LM WN P 0 R 

195 47050201 3311 3 2 12 66.25 4.0427 21-6841 24.1050 2 & 2 &0 %2 & 6032328 1 39.9333 
146 7076108 s129 3 2 13 35.50 Se2005 29.9134 48.1592 2 14 © $0 $%2 2 6026347 4 6101741 
147 30250003 3113 3 2 4 4683.50 BeS198 36.3335 46.2865 2 12 & 1310 & S$ O4.950594 1 19,8203 
198 seo02eec2 3131 3s 2 41 68.25 3.7463 165229 13.3329 2 & fF 438 2 & £O6093339% 4 30.9269 
199 s3010003 3113 3S 2 3 63.75 9.7671 C.9171 20.3635 2 1 4 4380 %S 4&2 OcO33108 1 32.1964 
15¢@ 10180008 3131 3 2 4 165.50 3.7414 *.6777 73.8127 2 & & 30 #%S 2 6.051989 1 19,4250 
151 17070108 3129 3 2 13 33.56 S-2003 29.4214 48.1592 2 © & 4308 #%S 8S eO9396347 1 6301741 
152 17040210 33313 3 2 41 36.25 7To3387 WetTST 26.6158 2 4§ %F %&38 3S & O6017689 1 56.5333 
153 210901% 3113 3 2 1 71.75 2248994 23.8811 63.7599 2 2 & 40 & =f 66035013 1 28.5621 
154 43010001 m3i33 3 2 13 78.00 6.1876 0.3505 9.8255 2 12 & 19 #%8© 2 6038061 1 26.2735 
155 avore2o2 r3si11 3S 2? 4&3 16.75 Se5735 1267996 33.2885 2 2 %F 30 j®8 3S 6037451 3 26.7718 
156 24030005 r3133 38 2 1 75.0¢¢ 4.4169 1.9021 47.9762 2 & && 280 §j& #& 06036597 1 27.3284 
15? @3070101 2320 2 2 2 24.00 0.0257 1364782 2.6996 2 2 fF 22 db & 06060507 1 16.5269 
158 010012 3133 3 2 2 56.50 0.1937 0.0393 6.3087 2 2 & 38 =f 2 6627570 1 3602718 
15S 3040101 2320 2 2 2 486425 0.16035 18.6407 504300 2 2 2 O88 2 3S 6e056726 1 1766287 
16¢ 22040102 23270 2 2 2 41.50 102532 23-938¢ 7865305 2 #29 8 28 Dd © 66020250 1 469.3815 
161 «2020006 3113 3 2 2 50.50 1.0814 2.365" 2.7516 2 2 Ff &8 2 & @6024642 1 40.5809 
162 2030105 2214 22 2 56.25 0.7704 17.4600 8545 2 2 2 8 2 2 06027448 1 3644326 
163 O1100008 2214 2 2 2 $1.50 0.4915 6«6113 5442 2 & © 28 2 §$ 06055372 1 65.0582 
164 6315202 2526 2 2 2 65.25 0.0480 12.2405 05725 2 & 2 4&8 2 © @6031890 1 31.9078 
165 1080206 2114 2 2 2 41.50 6.2743 7e1372 O.7098 2 § 3 G8 %S 8 6020250 4 449.3815 
166 e807¢202 2320 2 2 2 £8.25 0.0287 13.7000 3.6664 2 2 2 O28 S 2 0.043063 1 23.2219 
167 a7ace2ze2 2011 2 2 2 54.25 121677 7.3581 166704 &@ & 8 O88 3S 3S O6026872 1 #£37.7757 
168 3060208 2311 2 2? 2? 38.50 0.0163 3.6573 0.3339 32 & 3 38 3S & O,6098787 4 53,2294 
16S O3280302 2313 22 2 25.2% 0.3026 19.7536 6-2075 4 2 & G8 © @ OeO82322 4 860617 
17¢@ «617000202 4 «4w2etn 2 ltl? $4.25 1.1677 7.3581 1-670 2 © 9 82 © 2 06026472 4 37.7757 
171 1070002 2114 2 2 2 42365% 0.0798 3.8012 0.35491 2 & 2 $8 #& 3 0.055384 1 18.0558 
172 «#47130010 «82eis) 2 2 (Pe 22.00 0.2093 3.1526 0.5144 2 2 & 28 © © O,030735 2 93.1515 
173 O7120001 «= 28 2 2 2 1€4.25 0.7903 68.0867 1.0195 32 2 & 32 8 2 6050870 1 19.6579 
174 ceosceor? § 2212 2 2 ? 31.75 0.7994 40.5817 166625 21 #2 3 42 Lb 2 66015493 1 64.5459 
175 @S12020e 2212 2 2 e? 70.75 9.0785 58.71°° 1.5171 2 2 & 22 $d 3S 06039523 1 28.9658 
176 «608090005 § = 2212 2 2 2 31.25 0.4664 30.3249 100560 21 #2 & 482 $f & 06025249 1 65.5787 
17? €@3020203 2320 2 2 2 41.75 0.7611 32.9735 2.7035 1 #2 4 32 2 & 6020372 1 49.0858 
i7@ 30290108 2531 2 2 2 429.¢¢ @.331% 9.2722 41.7232 2 #2 & $2 $%2 2 06062947 3 15.8863 
179 «10270198 2513 2 2 2? 62.00 0.7837 81.7631 20.9709 2 2 F 42 2 3$ O603025¢ 1 33,0538 
18¢ «6608106205 8=— 2215 2 2 2 189.80 0.1408 51.8235 8.4279 1 2 & 22 $2 & 6077830 4 12.8985 
181 67010205 2213 2 2 2 73.50 0.4878 66.9560 2.3009 2 2 & 42 S f 6035865 1 27.8821 
182 eSoB8t00s 2212 2 2 ? 65.75 0.0768 S7.2718 3.2081 2 2 2 32 %S 2 06091843 1 23,8989 
163 412070203 2522 2 2 2? 49.25 0.2350 27.1878 49.5558 1 2 & 42 #+$%S 3 06024032 1 41.6108 
164 @5080001 2212 2 2 2 405.5¢ 9.1492 70.1356 2.0318 2 2 f 3? SS & 66051989 4 19,4259 
185 «30200012 2533 2 2 2 76.50 0.1119 66.5166 9.90865 21 #2 2 22 %& & @6037329 1 26.7887 
186 6©6©e8060002 0 8=6— 2.211 2 2 2 428.8¢ 0.0355 42.7799 3.3009 1 #2 & &2 +$%& 2 6062703 1 15.948] 
187 Cei3seoos 2213 2 2 2 46.2¢ 0.2677 41.5736 2.08671 121 #2 & 32 %& $ 06022568 1 44,3999 
1898 8 6esasez0e = =6 2215 2 2 2 36.50 7.0398 25.0672 #.4709 2 #2 & $2 %& © O,037811 1 56.1461 
189 20190008 3343 3 2 ? 25.75 2.9899 26.0863 3.3374 2 2 2 G3 2 & 06092565 1 79,5858 
19¢ 37060199 S120 3 2 ? 39.00 4.7193 52.5012 23.6903 2 F & 3S 2 2 66029031 1 £52.5470 
191 32020012 8 3485 3s 2 ? 68.5° S.A07% 2667142 5506878 2 1 =F BS 8 3S 06033426 1 29,9173 
192 10220003 2531 2 2 ? 66.00 1960586 67.%67) 729592 2 t & 23 8 © 06032206 1 31.0595 
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Table C-1l. continued 

No. Aa B D F G H I M N O P - 
193 17090009 2416 2 2 2 62-25 51400 812.5447 1.4997 2 2 3 23 2 4 0693038 1 32-921 
194 18080003 3131 > 2 2 61.75 1.7929 0.3332 13-7924 2 14 = 33 #2 2? 93013 1 33.188 
185 18030001 2610 2 2 2 44.25 17.9292 205039 30.5999 2 1 #1 13 #2 3 0.02159 1 46.313 
196 10190903 3113 2 2 2 98.00 80732 1864597 44.6646 2 12 #1 #33 2 4 0.94782 1 20.912 
197 05090202 4110 2 2 2 268600 2346752 201219 2261450 2 2 12 23 «3S 4 13077 3 7.647 
168 180930012 261¢ 2 2 2 225675 21.2874 504415 23.3595 2 1 #42 13 #3 2 Oc11016 1 9.978 
199 18030003 2610 2 2 2 37.25 9.6665 506561 22.0429 2 #1 #4 143 3 #3 0691818 1 55.916 
200 18030005 2610 2 2 2 10.50 17.9202 205039 30.5999 2 1 1 13 %3S 4 0600512 1 #£195.175 
201 18090004 2610 2 2 2 7675 3329152 5.3563 33-3198 2 14 4d 33 & f 0609378 1 £264,439 
2¢2 38090013 M2610 2 2 2 28.50 8.7571 4.618% 20.6400 2 1 #4 43 & #%2 06091391 1 71.906 
2¢3 411090105 3113 > 2 2 42.00 9.9662 14.9979 70.5995 2 1 1 13 4 3 602049 1 48.794 
204 O3100206€ 2311 2 2 2 86.25 5.0026 11.5322 23.8719 2 12 2% 13 #& & £«6699209 1 23.769 
205 02950191 2113 2 1 %& 123650 0.9239 29.7456 406287 « «© « 24 © « 4600009 1 2209 
#06 62050103 2113 2 414 41 40.00 0.0357 23.8844 #.3813 « «© « 44% © « 00000 1 1.000 
207 02050106 2113 2 1 2 4118.75 0.0256 17.1509 205984 «2 «© «© 24 © «© 2080009 3 1.090 
2c 02050107 2214 2 1 12 106650 0.0519 1661732 106404 2. « « 24 «© « 4et9000 1 12609 
209 02050301 2214 2 1 41 64.75 0.0581 22.7769 102251 »« «© «© 124 © «© 480000 1 1.329 
210 o20503¢5 2214 2 1 41 84.75 001415 3767476 201393 « « o« 4% © «© 090000 1 1.000 
211 020503C6 2214 2 42 &@ 132675 0.1276 41.5897 207149 «lw ClCH ltl siCi De OLOO 1.009 
212 03150196 2320 2 1 2 109.50 0.0751 1342061 202076 « «© « 14 © © 09009 1 12°99 
213 03150107 232¢ 2 1 41 63.25 0.0443 1261915 202923 «© «© « 3% © oo 00000 1 1.790 
214 03150201 2320 2 1 2 8.59 0.1147 19.9361 88349 « «© © 14 © « 460009 1 1.990 
215 03150203 2320 2 1 4 87.50 0.6192 15464597 7.3481 « «© «© 34% © o 600009 1 1.099 
216 03150204 232¢ 2 13 41 50.50 0.3059 1161758 109671 « © «© 414 © «© 4699000 1 1.090 
217 03160204 2311 2 1 2 45.75 0.504C 11.1812 1.2051 « «© « 44 © © 4¢90009 3 1.990 
218 03160205 2311 2 1 2 13.25 066923 1349059 101246 « «© «© 34 © o 00000 1 1.090 
219 05030191 2214 2 1 2 91.25 0.9359 29.5673 308290 « «© « 2% © «© 600000 1 1.000 
222 05030106 2214 2 1 41 59.00 0.0321 1521930 607620 « «© © 414 © oo 0900 1 1.0090 
221 05030201 2214 2 1 1 98.50 0.0127 1566863 604396 «© «© © 2% © «© 000000 1 1.900 
222 05030202 2211 2 4 41 74.25 0.C16C 1544294 5.7537 « «© « 34 © « 00000 1 1-°CO 
223 05060101 2211 2 41 41 51.25 0.0068 12.2440 4.2595 «lw le Keli ODOD 8. 1.000 
224 05090103 2211 2 1 41 48.00 0.0057 1501706 3.4449 2. «© «© 34 «© © 4696900 1 1.090 
225 05090201 2211 2 4 2 4110.75 0.0599 34.2787 4.5762 «© «© «© 44 oo «© 4000000 1 1.099 
226 05050203 2215 2 1 2 90.°0 1.9775 3802321 601332 « © «« 12% oo ¢« 00000 1 1.999 
227 05140101 2215 2 1 2 62.25 0.0757 3749049 5.0064 « « « 1% © « 490000 1 1.098 
228 05140104 2215 2 1 3 56.50 0.0261 35.8578 304792 « «© «© 44 «© «© e009000 1 1.000 
229 05140261 2215 2 1 41 66.50 0.0299 44,8991 208937 « «© © 34 © © 299000 1 1.000 
23° 05140202 2215 2 1 2 40.50 0.0136 59.3257 2.9143 « «© «© 1% © «© 4699000 1 1.0900 
231 05140203 2215 2 1 431 53.50 0.0193 4%6.8878 4.6322 «© «© «© 44 © o ett900 4 1.900 
232 05140206 2215 2 41 43 45.75 9.1977 %1.83¢8 306599 « «© «© 34 © «© 00009 1 1.000 
233 06010104 2214 2 1 2 51.75 0.0485 24,3995 3.8933 « «© « 44 © «© 4690000 1 1.900 
234 06010201 2214 2 1 2 74.00 0.0183 18.4249 204266 « «© « 34 © «© 4090000 3 1.900 
225 o6c20001 2214 2 1 2 94.50 M.0192 1869396 2022%6 6 Cw lCid Hl ia OTDKO O 1.090 
236 06030001 2211 2 1 41 77.50 0.0232 20.8505 203198 «»« «© « 1% © oo 4099090 1 1.000 
237 06030002 2215 2 1 2 16.25 0.0467 33465445 4.3880 .« «© © 44 © «© 4690000 1 1.999 
238 06030005 2215 2 2 4 109.25 0.0103 24.5561 3.9198 « «© «© 3% © «© et0000 1 1.000 
239 06040001 2215 2 12 1 112625 0.0015 1962483 1.8847 « © « 34 © oo e9t900 3 1.699 
249 06040005 2215 2 14 41 95.90 2.7072 28.8766 2.7359 «ow Cl tlt(CidAt si HOOD OD 1.000 
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Table C-1. continued 
No. Aa L C DeoOE F G H I J K L MM N 0 Pp Q R 
241 06040006 2215 é 1 1 40.75 0.0464 44.3845 228757 ° . ° 14 ° ° 1 1 1 
242 07040¢01 2215 2 i i 31.00 1.8122 57-3890 3.9¢31 . ° ° 14 ° ° i | 1 
243 07040C05 2215 2 i 1 63.75 0.2288 45.4237 3.8961 . . ° 14 ° ° i i 1 
244 07040006 2215 2 1 1 30.00 0.2452 44.4248 5.0427 . . ° i4 ° . i 1 1 
245 07060001 2215 2 i 1 53.59 0.9996 49.6472 5.4192 ° ° ° 14 ° ° i i i 
246 o706E0CO5 2215 2 1 1 52.75 0.1851 61.4749 9.0473 . ° ° i4 ° ° i | i 
247 07060C05 2215 2 i i 58.75 0.3370 68.8969 724720 . . ° 14 . ° 1 | i 
248 07080101 2511 2 i 2 37.00 0.6395 73-4831 *.1817 ° ” ° 14 ° . i i i 
249 07080104 2511 2 i i 117.00 0.3051 73.1281 5.18624 ° ° ° 14 ° . 1 i i 
250 07110061 2511 2 i i $2.75 0.0595 63-1476 7235745 ° ° ° i4 ° ° i 1 i 
251 07110004 2511 2 1 i 104.25 0.23516 66.2118 6.6144 ° . ° 14 . . i 1 1 
252 07110009 2511 2 1 1 19.25 0.1402 55.4076 4.1410 ° ° ° 14 ° ° i 1 1 
2535 07140101 2215 2 i 2 78-75 0.1258 46.9346 227644 ° ° . 14 ° ° 1 1 i 
254 07140105 2215 2 i 1 $9.00 1.06545 46.3792 321670 ° « . 14 ° . 1 1 1 
255 02010100 2312 2 | 2 129.25 2253545 66.9045 1.9437 . ° . is . “ i 1 i 
2356 08020100 2312 2 1 1 39.50 728136 60.7969 220714 ° ° ° 14 . ° 1 1 i 
257 08026401 2312 2 i 2 26-25 19.3850 31.9216 2-25975 ° ° ° 14 ° ° 1 1 1 
258 08030100 2312 2 1 1 40.50 6-6798 50.6819 1.6668 ° ° ° 14 ° . i | i 
259 06040301 2312 2 1 1 41.00 6.51358 36.7164 222225 . e ° 14 ° ° i i i 
260 02060109 2312 é 1 1 29.75 0.2794 36.2947 3.3888 . ° e 1¢ . e 1 i i 
261 02070100 2312 2 | 2 19.50 02-2714 20.2727 6.6186 ° e e 14 ° ° 1 1 1 
262 06090100 2312 2 1 2 30.25 0.0480 1.5240 5.94989 . ° ° 14 ° ° i i 1 
263 10110101 3112 3 i i 105.56 0.3472 56-4615 34.1528 * ° ° 14 ° ° i i i 
264 1¢13¢101 3112 $s 1 | 116-75 0.2427 60.2698 33-1415 ° ° ° i4 ° . i i i 
265 101308102 3112 3 1 1 1064.25 0.1852 39.5366 5722115 . ° ° 14 ° ° 1 1 1 
266 161350105 3112 $3 1 1 1535.00 0.35547 32-3472 67-5471 . ° . 14 ° . i i 1 
267 10140101 3112 3 ! i 139.59 6.6464 80.2537 51.7505 . ° . 14 . ° 1 1 1 
268 10178101 2531 2 1 1 129.00 4.7196 60.0088 24-6183 ° ° ° 14 ° ° 1 i i 
269 10230001 2531 2 1 i 76-00 224754 7720082 5.22508 ° . ° 14 ° . 1 i 1 
27¢ 10230006 2551 2 i 2 $2.25 1.1532 76-0238 3.4715 ° ° ° 14 ° . 1 1 1 
271 10240001 2531 2 i i 40.00 60.5719 76.0661 4.3062 ° ° . 14 ° « 1 1 1 
272 1¢240005 2511 2 i } 65.50 0.3844 75.7015 721067 ° ° . 14 ° ° 1 i 1 
273 10240011 2511 é i 2 47.00 0.2247 57.5445 10.0545 ° ° ° 14 . ° 1 i i 
274 1¢300101 2511 2 i 2 114.25 12-2150 94.4770 17-0325 ° ° . 14 ° ° | 1 i 
275 1¢300102 2215 2 i i 94.50 0.5792 49.9872 629679 ° ° ° i4 ° . i 7 i 
276 10300200 2511 2 1 1 59.50 0.7757 91.7559 4.1485 ° ° ° 14 ° ° i 1 1 
277 11030015 2535 2 ] 2 36-75 1.8944 63.*232 25-4179 . ° ° 14 ° ° i 1 1 
278 11066C01 2531 2 1 2 29-75 €.1219 35.3780 44.7168 . ° ° 14 ° ° 1 i i 
279 11060006 2512 2 i 2 78.56 €.1127 56-9768 4.7164 e ° . 14 ° ’ 1 i 1 
260 11110101 2511 2 1 2 39.75 0.1669 18-6859 $2223595 ° ° . 14 ° ° i 1 1 
261 11116102 2511 2 i 1 25-25 0.41862 24-5181 24.7914 ° . ° 14 ° ° i i i 
282 11116104 2511 2 i | 66.96 0.3156 186.4129 19.1758 . ° . 14 ° ° 1 1 i 
285 11110201 2215 2 1 1 71-75 0.3159 18.5356 6.8787 ° ° ° i4 ° . 1 1 1 
264 11110202 2215 2 i 1 #3259 0.2628 16.4728 623620 . ° ° 14 ° ° i 1 i 
285 11110205 2215 2 1 2 43.25 0.5797 20.0118 12.4564 ° ° ° 14 ° . 1 i | 
2866 11110207 2320 2 1 2 44.75 725974 22-5212 225917 ° . . 14 ° ° 1 1 1 
267 11130201 2512 2 J 1 68.09 0.2742 25-5154 49.9245 ° . . 14 ° ° 1 1 | 
268 11130210 2512 2 i 1 37.275 0.7343 2626864 37.7580 . . ° 14 ° . i i i 
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Table C-1. concluded 
No. Aa B C D E F G H I J K L M N 0 P = 6©Q - 
289 111401301 2512 2 i 1 76.75 0.5255 33.1703 26-7622 . ; 14 . P i i 1 
29¢ 11140106 2320 2 i 1 76.00 0.3¢12 23.6291 16.6765 ‘ ‘ ‘ 14 i i i 
291 111490201 2320 2 i 2 4e@.75 0.6448 21.0236 8.0878 " ; 14 ‘ " i i i 
292 11146202 2320 2 i 2 12.75 0.2367 18.9944 7.4691 " i ‘ 14 ; i i 1 i 
293 11140207 2311 2 1 i 75.25 0.1773 10.3040 2.8564 ‘ ‘ _ 14 5 7 i i i 
294 17020001 r2111 2 i 1 62.06 4.4786 17.5292 19.6311 ‘ : ‘ 14 " . i i i 
295 17020005 3120 : 1 1 47.59 5.9202 11.2047 18.3163 . ‘ 14 ‘ ‘ i i i 
296 17020¢10 "2415 2 i 1 39.50 8.1331 15.6153 18.3615 . ‘ " 14 ‘ ‘ i i i 
297 17020616 3131 : i 1 56-75 13.4417 22.9789 2128162 ‘ ‘ ‘ 14 ‘ a 1 i i 
298 17970102 3131 3 i 1 8e.75 7.1760 24.4515 29.5581 . ‘ y 14 " ‘ 1 1 1 
299 17070105 M2415 2 i 2 51.00 4.5728 10.0643 25.9034 ‘ 7 . 14 ‘ " i i i 
300 17080003 2410 2 1 2 42.00 0.8302 5.7910 127112 ‘ ° ‘ i4 ‘ i 1 i 
301 1708003 P2413 2 i 1 37.75 6.2421 2.9857 6.8898 i" ° ° 14 ‘ ‘ 1 i i 
302 17080006 m2411 2 i 1 35.00 0.3402 324959 0.7231 ‘ . ‘ 14 ° ‘ 1 i i 





¢letters representing column headings are keys to the descriptions in the text. 
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APPENDIX D. SAMPLING USING PROBABILITY PROPORTIONAL TO SIZE 
WITH MINIMUM REPLACEMENT 


The equations and procedures described below refer to the second-stage 
selection of reaches from cataloging units in the nonlarge river strata. 
However, with obvious notational changes, they apply as well to the first- 
stage selection of cataloging units or to the selection of reaches from the 
cataloging units of the large river stratum. 


In strata h = 1,2,..., 13, a total of no(h,i) = 6, where i = 1, 2, 
--+, M,(h), reaches are to be independently selected with probability pro- 
portional to size from within each cataloging unit selected at the first 
stage of the design. In this case, the expected frequency of including 
reach j contained in cataloging unit i in the sample is given by: 


€{t(h,i,j)} = no(h,i) aie 


z 8 S(h,i,j) 
j=l 





where S(h,i,j) is the length of reach j in cataloging unit i in stratum h, 
and No(h,i) is the total number of all reaches in that cataloging unit. 


In the minimum replacement procedure, the second-stage frame was first 
put in random order. In the large river stratum, the random order was es- 
tablished for the entire stratum. In the remainder of the area frame, 
reaches were randomly ordered within each sample cataloging unit. The sub- 
script j' is used below to denote the position of the reach in the randomly 
ordered array. For ease in presentation, the subscripts denoting strata 
(i.e., subscript h) and cataloging unit (i.e., subscript i) are dropped. 
Thus, with the notational change, E{t(j')} is the expected frequency of se- 
lecting reach j' in the randomly ordered array, and n,=6 sample reaches 
are to be selected from the N, available. 


Starting with the first reach in the randomly ordered array, the par- 
tial sums are computed: 


2 
E  E{t(j')} = 1(2) + F(2) 
j'=1 
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for 2=1, 2, ...,N,, where I(2) is the integral part of the partial sum, 
and F(2) is the fractional part. The procedure for selecting reaches from 
a cataloging unit is sequential; adding each new reach to the partial sum 
creates the next step in the sequence. 


At each step 2, let n,(2) denote the number of times the reach in 


position 2 in the randomly ordered array will be selected. Then the total 
sample size selected to this point: 


n,(j‘) 


| Mite 


is constrained to equal one of two values, either I(2) or_I(2)+l. This 
constraint defines two mutually exclusive events, C(2) and C(2), with the 
relationships: 


2 
if C(2) = 1 then = nj(j') = 1(2) +1 
j'=1 
and 
_ 2 
if C(2) = 1 then = ng(j') = (2) 
j'=1 


Each of these events takes on a value of 0 or 1 such that when C(2)=1 the 
C(2)=0 and vice versa. Because the two events are mutually exclusive, 
P{C(2)}+P{C(2)}=1. At each step, the probabilities P{C(2)} and P{C(2)} 
are computed, as described below, so as to be compatible with the expected 
frequencies, E{t(j')}. ' 


To initialize the procedure, at step 2=0, set: 


1(0) = 0 
F(0) = 0 
P{C(2)} = 0 
P{C(2)} = 1 
c(0) = 0 
c(0) = 1 
n,(0) = 


In each subsequent step, the probabilities assigned to the events C(2) and 
C(2) are computed using the rules given below: 
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PiC(2)} 

















Deterministic - 
Case Condition If C(2-1)=1 If C(2-1)=1 
(1) F(2)=0 0 0 
(2) F(2)>F(2-1)20 [F(2)-F(2-1)]/(1-F(2-1)] 1 
(3) F(2-1)2F(2)20 0 F(2)/F(2-1) 


The value of F(2), for 2= 1, 2,...,N,, identifies which of the three cases 
defined in the second column has occurred at each step in the sequential 
procedure. The probability, P{C(2)}, assigned to the event C(2), depends 
on which event was realized at the previous step, C(2-1) or C(2-1). 


After the probabilities P{C(2)} and_thus P{C(2)} = 1-P{C(2)} have been 
computed, the choice of events C(2) or C(2) is subsequently selected using 
a random number from a uniform distribution between 0 and 1. If the random 
number exceeds P{C(2)}, then C(2)=0 and C(2) = 1, and the total sample 
size at that step must be I(2) reaches. If the random number is less than 
P{C(2)}, then C(2) = 1 and C(2) =0, and then total sample size at that step 
must be I(2) + 1 reaches. At each step 2, the reach in position 2 in the 
randomly ordered array is selected as many times as required by the differ- 
ence in sample sizes, n,(2)-n,(2-1), defined by the realization of the 
event C(2) or C(2). The selection process is complete when 2 has reached 
the value N,, at which point 


N, 


2 nj(j') =n, =6. 
jt=1 
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APPENDIX E. ESTIMATION PROCEDURE DETAILS 


DEFINITIONS AND NOTATION 


In order to describe the calculations required to compute the param- 
eter estimates, it is necessary to develop a notation that identifies each 
of the units of analysis with respect to the probability structure used in 
the design and selection of the sample. In this case, the unit of analysis 
is a reach. Reaches are also the ultimate sampling units. However, some 
notational complexity arises because of the different designs used to se- 
lect the sample of reaches from the large river stratum versus the re- 
mainder of the area frame. In the latter case, a two-stage design was 
used, whereas the large river reaches were selected in a single stage. 


In general, reaches are classified into cataloging units. Each cata- 
loging unit is, in turn, classified into one of the design strata used to 
contro] the distribution of the sample. Thus, reaches are uniquely iden- 
tified by three subscripts, specifying, in order, the stratum, the catalog- 
ing unit within the stratum, and, finally, the reach within the cataloging 
unit. The complication is introduced by the fact that cataloging unit 
identification is ignored in selecting the sample of reaches from the large 
river stratum. 


Strata are denoted by the subscript: 
h=1,2...,14 
where h = 14 denotes the large river stratum. 


Cataloging units are denoted by the subscript i. [The range of sub- 
script vaiues: 


i= 1,2,..., Ny Ch) 
for h = 1,2,...,14, denotes cataloging units in the frame (i.e., the total 
population) and where N,(h) is the total number of cataloging units clas- 
sified into statum h. 
The range of subscript values: 


i = 1,2,...,n,(h) 


for h = 1,2,...,13, denotes cataloging units in the sample. Specifically: 
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n,(h) = the first stage sample size allocated to stratum h for h =1, 
2,...13. Because the samples were selected with probability proportional 
to size and with minimum replacement, the sample size, n,(h), may be 
greater than the number of distinct cataloging units in the sample. 


Reaches are denoted by the subscript, j. In general, the total number 


of reaches in cataloging unit i contained in stratum h can be denoted by 
No(h,i). The total number of reaches in stratum h is given by: 


TOF BW © 5.85650 558s 


Except for the large river stratum, reaches in each cataloging unit of 
the frame are denoted by: 


j = 1,2,..., No h,i) 
for h = 1,2,...,13 and i = 1,2,...,n,(h). Because the cataloging unit 
identifiers were not used in selecting the sample in the large river 
stratum, reaches in the frame are denoted by: 

j 7 i Poe No(h,+) 
for h = 14. 


In the nonlarge river strata, a sample of no(h,i) reaches was selected 
from (sample) cataloging unit i in stratum h. Thus: 


j = 1,2,..., mo(h,i) 
for h = 1,2,..., 13 and i = 1,2,..., n,(h), denotes reaches in the sample. 
Reaches in that part of the frame represented in the sample of cataloging 


units are denoted by: 


j = 1,2,..., No(h,i) 


for h = 1,2,..., 13 and i = 1,2,..., n,(h). 


In the large river stratum, a total sample size of n,(14,+) reaches 
was selected. Thus: 


j - i Serre No(h,+) 
for h = 14, denotes reaches in the sample selected from the large river 
stratum. 
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Again, because reaches were selected with probability proportional! to 
size and with minimum replacement, there is a chance of selecting very 
large reaches more than once. Therefore, the sample sizes, no(h,+) and 
No(h,i), may be greater than the number of distinct reaches. 


Each reach in the frame has a length, or size, which can be determined 
for any reach in question and, for reach j in cataloging unit i in the 
Stratum h is denoted by S(h,i,j) for h = 1,2,...,14; i = 1,2,...,Ne(h); and 
j = 1,2,...,Ne(h,i). 


The size of cataloging unit i is defined as the sum of the sizes of 
the reaches contained in the cataloging unit, written as: 


No(h,1) 
S(h,i,+) = 2 S(h, i,j) 
j=l 


for h = 1,2,..., 14 and i = 1,2,..., N,(h). Similarly, the size of stratum 
h is defined as the sum of the sizes of the reaches contained in the 
Stratum, written as: 


Ni Ch) No(h,7) 
S(h,+,+) = 2 2 S(h,i,j) 
i=l j=l 


for h = 1,2,...,14. 


Each reach in the frame has associated with it a set of observable 
quantities, or response variables, for which values were sought for the 
sample reaches. Members of the set of response variables are denoted by 
the subscript: 





In general, the response variables can be denoted by Y_ and their values by 
y.(h,i,j). The conventions adopted above to diffefentiate between the 
1Srge river stratum and the remainder of the area frame are continued with 
respect to the response variables and their values. 


It is likely that values cannot be obtained for at least some of the 
response variables for at least some of the sample reaches, thereby incur- 
ring some level of missing data. For the large river stratum, the sample 
reaches for which a value for response variable c was obtained, are de- 
noted by: 





j= 1,2,...,m.(14,*) 


for c = 1,2, ...,C. If no missing data problem is encountered: 


m.(14,+) = no(14,+) 
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over al! values of the c-subscript. Similarly, for the remainder of the 
area frame, sample reaches for which a value for response variable c was 
obtained, are denoted as: 


co Aer a(h,i) 


for c = 1,2,...,C; h = 1,2,...,13; and i 
the absence of any missing data: 


1,2,...,M,(h). As before, in 


m_(h,i) = ng(h,i) 


over the relevant values of the various subscripts. 


CALCULATION OF SAMPLING WEIGHTS 





Large River Stratum 


The sampling weight for reaches in this stratum is given by: 





- 5-1 
w(14,*,3) = [mg(14,*) Speeds 


for j = 1,2,...,N,(14,+), which is the reciprocal of the expected fre- 
quency with which reach j is selected in samples of size n,(14,+), using 
the probability proportional to size with minimum replacement randomiza- 
tion procedure. 


Remainder of the Area Frame 





The sample for the remainder of the area frame was selected in two 
Stages. At the first stage of sampling, a sample of cataloging units was 
selected with probability proportional to size and with minimum replace- 
ment. Thus, the first stage component of the weight is: 





; yal 
Whi) = /my(h) aCmrae 


\ 


for h = 1,2,...,13, and 1 = 1,2,...,N,(h). 


At the second stage of sampling, reaches were selected conditionally 
within first stage selections with probability proportiona! to size and 
with minimum replacement. Hence, the second stage component of the sam- 
pling weight is: 


s(h,i,j)] 2 


W(j]h,i) = macht) Sth 7+) | 


for h = 1,2,...,13; 4 = 1,2,...,m,(h); and j = 1,2,...,Np(h). 
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The total (or overall) sampling weight is the product of the two 
quantities, namely: 


W(h,i,j) = W(j h,i) W(h,i) 


for h = 1,2,...,13; 1 = 1,2,...,ny(h); and j = 1,2,...,Ne(h,i1). Algebra- 
ically, forming the above product allows the cancellation of the term 
S(h,i,+), the size of cataloging unit i in stratum h. However, its value 
was determined, first, in constructing the first-stage frame, then again 
(more precisely) in constructing the second-stage frame. The respective 
(slightly different) values were used in the two stages of the sampling 
selection process, so these values should be retained separately in the 
weight calculations. 


MISSING DATA COMPENSATION 


The values, m.(h,+) and m_(h,i), were defined as the total number of 
sample reaches (counting multiply selected reaches each time they occur) 
for which a value for response variable c was obtained. In general: 


m (h,i) ne(h,i) 
2 Wh,i,j) # 2 W(h,i,j) 
j=l j=l 


unless m_(h,i) = no(h,i); i.e., unless a response variable value is ob- 
tained for every sample reach in cataloging unit i in stratum h. Other- 
wise: 


m(h,i) < ng(h,i) 
and the potential for missing data biases is introduced. 


Weighting class adjustments were proposed (and used) for missing data 
compensation for this survey. For the large river stratum, the weighting 
class was defined as the set of all sample reaches in the stratum. For 
the remainder of the area frame, weighting classes were defined as the set 
of sample reaches in the same (sample) cataloging unit. If there were so 
much missing data that such a weighting class did not exist (not the case 
for the Netional Fisheries Survey), different weighting classes would need 
to be defined, perhaps involving geographically proximal cataloging units 
in the same stratum. 


Regardless of the weighting class definition, the calculation requires 
multiplying the sampling weights for reaches supplying a response variable 
value by the ratio of the sum of the sampling weights of all sample reaches 
in the weighting class the sum of the samp!ing weights of those reaches in 
the weighting class supplying a response variable value. In other words, 
this is equivalent to replacing the missing data by the average of al] 
available data in the same weighting class. For the large river stratum, 
the calculation is: 
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no(14,+) 
512) W(14,+,j') 





w(14,+,j) = W(14,+,j) 
m_(14,*) 
x wWw(14,+,j') 
j‘=1 


for c = 1,2,...,C and j = 1,2,...,m.(14,+). For the remainder of the area 
frame, the calculation is: 


No(h,i) 

ja W(h,i,j') 

w(h,i,j) = WOh,i,j) 
m_(h,i) 
“ss Wh, i,j") 
j'=l 





for c = 1,2,...,C; h = 1,2,...,13; i = 1,2,...,n,(h); and j = 1,2,..., 
m_(h,i). 


REPLICATED VERSUS POOLED ESTIMATES 


The original sample design for this survey was a two-phase or double 
sampling des ji, originally intended for use in assessing certain measure- 
ment biases using difference estimators. In order to provide unbiased 
variance estimates for the difference estimators, the entire design was 
replicated. The requirement for the measurement error assessment was sub- 
sequently dropped, but not until the replication feature had been included 
in the selection of the sample. 


Consequently, there is a choice of estimation procedures: either use 
the replication feature or pov.l the separate replicates. If the subscript 
r = 1,2,3,4 is used te denote the replicates, each of the quantities de- 
fined to this point can be identified as to replicate. For example, 
w. (h,i,j) denotes the adjusted weight for estimating a parameter based on 
response variable c obtained for reach j, contained in the cataloging unit 
i and stratum h, and belonging to replicate r. Because the replicates were 
independently selected, the same reach may appear in more than one repli- 
cate. 


Given that the minimum replacement procedure was applied independently 
to the replicates, a reasonable model to use in pooling replicates is that 
of a probability proportional to size, with replacement selection of al] 
units (iji.e., reaches alone in the large river stratum, both reaches and 
cataloging units in the remainder of the area frame). Under this model, 
unbiased variance estimates can be computed using the differences among 
primary points (i.e., among the 79 reach selections from the large river 
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stratum, plus the 204 cataloging unit selections from the remainder of the 
area frame). Subtracting the number of strata (i.e., 14) from the sum of 
79 and 204 yields 269 degrees of freedom, for which the Student's t is ap- 
proximately 2 at the 5% significance level. 


Unbiased variance estimation under the replicated design, on the other 
hand, has but 3 degrees of freedom (i.e., the total number of replicates 
minus one), for which the Student's t is 3.182 at the 5% significance 
level. Thus, a two-thirds reduction in the variance estimate using the 
replicated design would be needed to make this more efficient than the 
pooled approximation. Although a numerical comparison of the two ap- 
proaches was not made for this survey, general survey researc:. experience 
suggests that the pooled approximation is likely to be the more efficient 
estimation procedure. This, in fact, was the procedure used. 


In summary, except for variances, the form of the estimators is the 
same using either approach. For the replicated design, the estimation 
formulas described in the remainder of this Appendix can be applied separ- 
ately to each replicate and the results averaged to obtain the parameter 
estimate of interest. 


Variance estimates are, of course, computed quite differently for the 
two approaches. The variance estimators for the pooled design are de- 
scribed in the remainder of this Appendix. For the replicated design, 
where the parameter estimite is the mean of the separate replicate esti- 
mates, the variance estimator is simply: 


R ™ = 
t ty, > yi? 
R[R-1] r=1 





Var{y} = 


where Y,. = the estimate obtained from replicate r 


There are a total of R=4 replicates for this design. 


ESTIMATION OF TOTALS AND ASSOCIATED VARIANCES 


A population total is defined as the sum of the values of a response 
variable over all the units in the population. For example, the total in 
the population of reaches of the response variable, Yoo is given by: 


14 N,(h) No h,i) 
x D3 pA y.(h,i,J) 
h=1 i=l j=l 


a) 
i) 
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for c = 1,2,...,C. An unbiased estimate of the total is obtained by sum- 
ming the products of the sampling weights and the response variable values 
obtained for the units in the sample. Thus: 


. 13. n,(h) m_(h,i) 
T. = = = wo(h,i,j) y(h, i,j) 
u~l = =6i=l j=l 
m.(14,+*) 
+ 2 w_(14,+,j) y_(14,+,j) 
j=l Cc Cc 


for c = 1,2,..., C, is an unbiased estimate of the parameter, as 


The value: 
P 6h. 1) 
T.h,i) = “2 wih, i,j) y(n, i, Jj) 
Cc _— Cc c 
j=l 
for h = 1,...,13 and i = 1,2,. mM can be thought of as the contri- 
bution made to the estimated total, , by sample cataloging unit i in 


Stratum h. Similarly, in the large "vei stratum the value: 


T.(14,5) = w4,+,5) y(4,+,4) 


for j = 1,2, »m_(14,+), is the contribution of sample reach j in the 
stratum to the “estimated total. The variance estimator, using the pooled 
approximation, makes use of these quantities. 


The form of the variance estimator is made clear by defining the sub- 
script: 


k = 1,2,...,K(h) 


where K(h) m (h,+) if the k-subscript denotes reaches in the large river 


stratum 


n,(h) if the k-subscript denotes cataloging units in the re- 
mainder of the area frame 


Using this notation, the estimated total can be rewritten as: 
P 14 Kh) .« 

T = & &~ T.Ch,k) 

© hel k=1 = 
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The variance estimate is given by: 


- _ 14 K(h K(h) E l K(h) . 1° 
Var iT} = = K(h)-1 = T(h,k) - Kth) ha TCh,k) | 


The expression in the brackets is the difference between the unit level 
contribution to the stratum total and the stratum average of the contribu- 
tions. These differences are squared and summed over the sample units in 
the stratum. Multiplying this sum by the number of units in the stratum, 
and dividing it by this number minus one, provides the stratum level con- 
tribution to the total variance. Summing over all strata produces the re- 
quired variance estimate. 


ESTIMATION OF MEANS, PROPORTIONS, AND ASSOCIATED VARIANCES 


The average value of a response variable over the population of 
reaches is defined as: 


14 N,(h) Noh, i) 
x x i y(h,i,J) 
h=1 i=l j=l 








. 14 N,(h) 
z > No h,i) 
h=1 i=1 
T 
Cc 
N 
where ve = the total of the response variable values in the population, as 
in the previous section 
N = the total number of reaches in the population 
If the response variable, Y is categorical (taking on the values 1 or 0 


depending on whether or not the reach belongs to the category of interest), 
the above expression defines a proportion, denoted by Pe: 


In this sampling frame, the total number of reaches, N, is unknown and 


must be estimated from the sample data, along with the numerator total, TO 
The resulting (ratio) estimator, denoted by: 
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(or P_ in the cise of proportions) is in general not unbiased. However, 
the magnitude of the bias is inversely related to the sample size and is 
usually unimportant in even moderately sized samples. 


For this study, estimates expressed in terms of per mile of reach (or 
some other unit of measurement), as well as per reach, are also required. 
In general, such ratio estimators can be expressed as: 


. 1% 
R= = 
Ty 

Cc 


where the subscript c denotes the response variable defining the numerator 
total and the subscript c' denotes the response variable defining the de- 
nominator total for the ratio of interest. The numerator and denominator 
totals are estimated separately and then divided to obtain the estimated 
ratio. 


Because the ratio estimate is a nonlinear statistic, its variance 
cannot be expressed in closed form. A first-order Taylor series lineariza- 
tion is proposed as an approximation. From a computational viewpoint, the 
linearization is most easily applied at the level of the response variable 
values. To this end, define values, Zo» as: 


z(14,+,5) = y (14.4.5) - Ro yi 14,+,5) 


for j = 1,2,...m.(14,+), for responses obtained from the large river stra- 
tum, and: 


z(h,i,J) = y(h,i,j) - R. y.i(h,i,J) 


for h = 1,2,...,13; i = 1,2,...,n,(h); and j = 1,2,...,m_(h,1), for re- 
sponses obtained from the remainder of the area frame. Using Z in place 
of vy. in the previous section, compute the quantities: 


Le (h,k) and Var iT, } 
Cc Cc 


The required variance estimate is then given by: 


. Var is } 
Var {R_} = Cc 
C ——F 


Ty 
Cc 
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ESi IMATION OF REGRESSION RELATIONS 


Regression relations are used in assessing the ability of one set of vari- 
ables to predict the values of another set. If the previously developed 
notation is used, the response variables in either set are denoted using 
particular values of the subscript, c. This notation emphasizes that the 
Same sampling error structure is associated with all the response variables 
in the survey, regardless of whether they are predicted or predictor vari- 
ables in the regression. However, the more usual notation, in which the 
set of variables being predicted is denoted by Y and the predictor set by 
X, is used in this section for clarity and ease in presentation. In the 
following, a letter with an underscore (e.g., x) represents a column vector 
while the same symbol with a prime (e.g., x') represents the transposed or 
row vector. 


At the level of an individual reach in the population, a (linear) re- 
gression relation is defined by: 


y(h,i,j) = x’ (h,i,j) B + e (h,i,j) 


for h = 1,2,..., 14; i = 1,2,..., N,(h); and j = 1,2,..., No(h,i), where 
y(h,i,j) is the response variable value associated with reach j in cata- 
loging unit i and stratum h. The response variable value is to be pre- 
dicted from x'(h,i,j), a vector of response variable values for the same 
reach, using B, a vector of (regression) coefficients. The "error" term: 


e(h,i,j) = y(h,i,j) - x' (h,i,j) B 


expresses the failure of the prediction to be exact. The definition of the 
vector 6, which minimizes the sum of squared deviations: 


14 N,(h) No(h, i) 
> D2 2 e*(h,i,j) 
h=l i=l jel 


is given by (assuming a model of full rank): 


14 N,(h) No(h,i) 1714 N,(h) Noh, i) _ 
B=] 2 2 2 x'(h,i,j) x€h,i,j 2 2 2 x'(h,i,j) yCh,i,j) 
h=1 i=l j=l h=1 i=l = j= 
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In the usual matrix notation, the definition of the parameter vector is: 


B = [X'X] “1 [X'y] 


The prior expression, however, emphasizes the (finite) population basis of 
the definition. Although the average value of the deviations in the popu- 
lation (i.e., the expected value of the deviations) is zero, there is no 
necessity that the individual e(h,i,j)-values vanish when averaged over re- 
peated observations of the same reach. 


The definition of B can be considered the multivariate extension of 
the previous section. In this context the quantity, X'X, is the population 
"total" defining the "denominator," and X'Y is the population “total" de- 
fining the “numerator." Unbiased estimates of these quantities are given, 
respectively, by: 


. 13 N,(h) m(h,i) 
X'X= = ¢& = w(h,i,j) x'(Ch,i,j) x(h,i,j) 
h=1 i=1 j=l 


m(14,+) 
+ & w(14,+,j) x'(14,+,j) x(14,+,j) 
j=1 


and 


‘ 13 n,(h) m(h,i) 
K'Y= ¢f z x w(h,i,j) x'(h,i,j) y(h,i,j) 
h=1 i=l j=l 


m(14,+) 
+ 2 w(14,+,j) x’ (14,+,9) y(14,+,j) 
j=1 


The values, m(h,i) and m(14,+), are the numbers of reaches supplying a 
complete set of multivariate values. It follows that the missing data 
compensation procedure needs to be modified to accommodate the situation 
where one or more of the multivariate values is missing. 


As was the case for the ratio estimate, the estimate B is nonlinear, 
and a first-order Taylor series linearization is proposed to obtain an ap- 
proximate variance-covariance matrix for the B-values. The linearized 


variables in this case are vector quantities defined for the large river 
stratum by: 


2(14,+,j) = x(14,+,j) [y(14,+,5) - x'(14,+,2) B] 
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for j = 1,2,...,m(h,+) and over the remainder of the area frame by: 
z(h,i,j) = x€h,i,j) [y@h,i,j) - x'(h,i,j) B) 


for h = 1,2,...,13; 1 = 1,2,...,n,(h); and j = 1,2,...,m(h,i). The vector 
quantities: 


114, 5) = w14,+,j) 2(14,+,3) 
for j = 1,3,...,m(h,+) and: 


‘ m(h,i) 
T(h,i) = why i,j) 2Ch, i,j) 
j=1 


for h = 1,2,...,13 and i = 1,2,...,n,(h) are computec. Using the previ- 
ously defined subscript, k, allows the single vector T(h,k) to be used in 
the formulation. 


The T_(h,k)-values are then used in the multivariate extension of the 
variance estimate for totals given previously. Specifically, the (first- 
stage unit level) matrices: 


P 1 K(h) . a 1 K(h) . 
n¢h,k) = [T_(h,k) - ——— 2) Th,k)) [T_Ch,k) - ——— =~) T_(h,k)]' 
. K(h) k=1 ~% . K(h) k=1 7 


are computed. These matrices are then summed over the first stage units 
(i.e., values of the k-subscript) in each stratum and over strata to yield 
the matrix: 


ae 14 K(h) 
var {Ty = 2 AOD san, k) 


h=1 K(h)-1 k=1 


The required variance-covariance matrix of the B-values is then: 


Var {B} = [X'X]”. Var {T} [x'x] 
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